ńņš. 10 |

the expectation conditional (as always) on the value of the open O. The eļ¬ect

of a knock-out lower barrier at Oeā’a is essentially the same but with b replaced

by a, namely

a(a + ln(C/O))

E[Ļ(C)(1 ā’ exp{ā’2 })].

Ļ2T

In the next section we consider the eļ¬ect of two barriers, both an upper and a

lower barrier.

One Process, Two barriers.

We have discussed a simple device above for generating jointly the high and the

close or the low and the close of a process given the value of the open. The joint

distribution of H, L, C given the value of O or the distribution of C in the case

of upper and lower barriers is more problematic. Consider a single factor model

and two barriers- an upper and a lower barrier. Note that the high and the

low in any given interval is dependent, but if we simulate a path in relatively

short segments, by ļ¬rst generating n increments and then generating the highs

and lows within each increment, then there is an extremely low probability

that the high and low of the process will both lie in the same short increment.

For example for a Brownian motion with the time interval partitioned into 5

equal subintervals, the probability that the high and low both occur in the

same increment is less than around 0.011 whatever the drift. If we increase the

number of subintervals to 10, this is around 0.0008. This indicates that provided

we are willing to simulate highs, lows and close in ten subintervals, pretending

SIMULATING BARRIER AND LOOKBACK OPTIONS 287

that within subintervals the highs and lows are conditionally independent, the

error in our approximation is very small.

An alternative, more computationally intensive, is to diļ¬erentiate the inļ¬nite

series expression for the probability P (H Ā· b, L ā„ a, C = u|O = 0] A ļ¬rst step

in this direction is the the following result, obtained from the reļ¬‚ection principle

with two barriers.

Theorem 47 For a Brownian motion process

dSt = Āµdt + dWt , S0 = 0

deļ¬ned on [0, 1] and for ā’a < u < b,

P (L < ā’a or H > b|C = u)

ā

1X

[Ļ{2n(a + b) + u} + Ļ{2n(a + b) ā’ 2a ā’ u}

=

Ļ(u) n=1

+ Ļ{ā’2n(b + a) + u} + Ļ{2n(b + a) + 2a + u}]

where Ļ is the N (0, 1) probability density function.

Proof. The proof is a well-known application of the reļ¬‚ection principal.

It is suļ¬cient to prove the result in the case Āµ = 0 since the conditional

distribution of L, H given C does not depend on Āµ (A statistician would say

that C is a suļ¬cient statistic for the drift parameter). Denote the following

paths determined by their behaviour on 0 < t < 1. All paths are assumed to

end at C = u.

288 CHAPTER 5. SIMULATING THE VALUE OF OPTIONS

= H > b (path goes above b)

A+1

path goes above b and then falls below ā’a

=

A+2

goes above b then falls below ā’a then rises above b

=

A+3

etc.

L < ā’a

=

Aā’1

path falls below ā’a then rises above b

=

Aā’2

falls below ā’a then rises above b then falls below ā’a

=

Aā’3

etc.

For an arbitrary event A, denote by P (A|u) probability of the event conditional

on C = u. Then according to the reļ¬‚ection principal the probability that the

Brownian motion leaves the interval [ā’a, b] is given from an inclusion-exclusion

argument by

P (A+1 |u) ā’ P (A+2 |u) + P (A+3 |u) ā’ Ā· Ā· Ā· (5.30)

+P (Aā’1 |u) ā’ P (Aā’2 |u) + P (Aā’3 |u) Ā· Ā· Ā·

This can be veriļ¬ed by considering the paths in Figure 5.7. (It should be noted

here that, as in our application of the reļ¬‚ection principle in the one-barrier case,

the reļ¬‚ection principle allows us to show that the number of paths in two sets is

the same, and this really only translates to probability in the case of a discrete

sample space, for example a simple random walk that jumps up or down by a

ļ¬xed amount in discrete time steps. This result for Brownian motion obtains if

we take a limit over a sequence of simple random walks approaching a Brownian

motion process.)

Note that

Ļ(2b ā’ u)

P (A+1 |u) =

Ļ(u)

Ļ{2n(a + b) + u}

P (A+2n |u) =

Ļ(u)

Ļ{2n(a + b) ā’ 2a ā’ u}

P (A+(2nā’1) |u) =

Ļ(u)

SIMULATING BARRIER AND LOOKBACK OPTIONS 289

Figure 5.7: The Reļ¬‚ection principle with Two Barriers

and

Ļ(ā’2a ā’ u)

P (Aā’1 |u) =

Ļ(u)

Ļ{ā’2n(b + a) + u}

P (Aā’2n |u) =

Ļ(u)

Ļ{2n(b + a) + 2a + u}

P (Aā’(2n+1) |u) = .

Ļ(u)

The result then obtains from substitution in (5.30).

As a consequence of this result we can obtain an expression for P (a < L Ā·

H < b, u < C < v) (see also Billingsley, (1968), p. 79) for a Brownian motion

on [0, 1] with zero drift:

P (a, b, u, v) = P (a < L Ā· H < b, u < C < v)

ā

X

Ī¦[v + 2k(b ā’ a)] ā’ Ī¦[u + 2k(b ā’ a)]

=

k=ā’ā

ā

X

ā’ Ī¦[2b ā’ u + 2k(b ā’ a)] ā’ Ī¦[2b ā’ v + 2k(b ā’ a)]. (5.31)

k=ā’ā

where Ī¦ is the standard normal cumulative distribution function. From (5.31)

we derive the joint density of (L, H, C) by taking the limit P (a, b, u, u + Ī“)/Ī“ as

290 CHAPTER 5. SIMULATING THE VALUE OF OPTIONS

Ī“ ā’ 0, and taking partial derivatives with respect to a and b:

ā

X

k 2 Ļ00 [u + 2k(b ā’ a)] ā’ k(1 + k)Ļ00 [2b ā’ u + 2k(b ā’ a)]

f (a, b, u) = 4

k=ā’ā

ā

X

k Ļ [u + 2k(b ā’ a)] ā’ k(1 + k)Ļ00 [2b ā’ u + 2k(b ā’ a)]

2 00

=4

k=1

+ k2 Ļ00 [u ā’ 2k(b ā’ a)] + k(1 ā’ k)Ļ00 [2b ā’ u ā’ 2k(b ā’ a)] (5.32)

for a < u < b.

From this it is easy to see that the conditional cumulative distribution func-

tion of L given C = u, H = b is given by on a Ā· u Ā· b (where ā’2Ļ0 (2b ā’ u)

is the joint p.d.f. of H, C) by

ā‚2

ā‚bā‚v P (a, b, u, v)| v=u

(5.33)

F (a|b, u) = 1 +

2Ļ0 (2b ā’ u)

ā

X

ā’1

{ā’kĻ0 [u + 2k(b ā’ a)] + (1 + k)Ļ0 [2b ā’ u + 2k(b ā’ a)]

=0

Ļ (2b ā’ u)

k=1

+ kĻ0 [u ā’ 2k(b ā’ a)] + (1 ā’ k)Ļ0 [2b ā’ u ā’ 2k(b ā’ a)]}

This allows us to simulate both the high and the low, given the open and the

close by ļ¬rst simulating the high and the close using ā’2Ļ0 (2b ā’ u) as the joint

p.d.f. of (H, C) and then simulating the low by inverse transform from the

cumulative distribution function of the form (5.33).

Survivorship Bias

It is quite common for retrospective studies in ļ¬nance, medicine and to be

subject to what is often called āsurvivorship biasā. This is a bias due to the

fact that only those members of a population that remained in a given class

(for example the survivors) remain in the sampling frame for the duration of

the study. In general, if we ignore the ādrop-outsā from the study, we do so

at risk of introducing substantial bias in our conclusions, and this bias is the

survivorship bias.

SURVIVORSHIP BIAS 291

Suppose for example we have hired a stable of portfolio managers for a large

pension plan. These managers have a responsibility for a given portfolio over

a period of time during which their performance is essentially under continuous

review and they are subject to one of several possible decisions. If returns below

a given threshhold, they are deemed unsatisfactory and ļ¬red or converted to

another line of work. Those with exemplary performance are promoted, usually

to an administrative position with little direct ļ¬nancial management. And those

between these two āabsorbingā barriers are retained. After a period of time,

T, an amibitious graduate of an unnamed Ivey league school working out of

head oļ¬ce wishes to compare performance of those still employed managing

portfolios. How are should the performance measures reļ¬‚ect the ļ¬ltering of

those with unusually good or unusually bad performance? This is an example

of a process with upper and lower absorbing barriers, and it is quite likely

that the actual value of these barriers diļ¬ers from one employee to another, for

example the son-in-law of the CEO has a substantially diļ¬erent barriers than the

math graduate fresh out of UW. However, let us ignore this diļ¬erence, at least

for the present, and concentrate on a diļ¬erence that is much harder to ignore

in the real world, the diļ¬erence between the volatility parameters of portfolios,

possibly in diļ¬erent sectors of the market, controlled by diļ¬erent managers.

For example suppose two managers were responsible for funds that began and

ended the year at the same level and had approximately the same value for the

lower barrier as in Table 5.2. For each the value of the volatility parameter

Ļ was estimated using individual historical volatilities and correlations of the

component investments.

Portfolio Open price Close Price Lower Barrier Volatility

56 5

1 40 30 .5

8

56 1

2 40 30 .2

4

Table 5.3

292 CHAPTER 5. SIMULATING THE VALUE OF OPTIONS

Suppose these portfolios (or their managers) have been selected retrospec-

tively from a list of āsurvivorsā which is such that the low of the portfolio value

never crossed a barrier at l = Oeā’a (bankruptcy of fund or termination or

demotion of manager, for example) and the high never crossed an upper barrier

at h = Oeb . However, for the moment let us assume that the upper barrier is so

high that its inļ¬‚uence can be neglected, so that the only absorbtion with any

substantial probability is at the lower barrier. We interested in the estimate of

return from the two portfolios, and a preliminary estimate indicates a continu-

ously compounded rate of return from portfolio 1 of R1 = ln(56.625/40) = 35%

and from portfolio two of R2 = ln(56.25/40) = 34%. Is this diļ¬erence signiļ¬cant

and are these returns reasonably accurate in view of the survivorship bias?

We assume a geometric Brownian motion for both portfolios,

(5.34)

dSt = ĀµSt dt + ĻSt dWt ,

and deļ¬ne O = S(0), C = S(T ),

H = max S(t), L = min S(t)

0tT 0tT

with parameters Āµ, Ļ possibly diļ¬erent.

In this case it is quite easy to determine the expected return or the value of

any performance measure dependent on C conditional on survival, since this is

essentially the same as a problem already discussed, the valuation of a barrier

option. According to (5.27), the probability that a given Brownian motion

process having open 0 and close c strikes a barrier placed at l < min(0, c) is

zl

}

exp{ā’2

Ļ2 T

with

zl = l(l ā’ c).

Converting this statement to the Geometric Brownian motion (5.34), the prob-

ability that a geometric Brownian motion process with open O and close c

SURVIVORSHIP BIAS 293

breaches a lower barrier at l is

zl

P [L Ā· l|O, C] = exp{ā’2 }

Ļ2T

with

zl = ln(O/l) ln(C/l) = a(a + ln(C/O)).

Of course the probability that a particular path with this pair of values (O, C)

is a āsurvivorā is 1 minus this or

zl

1 ā’ exp{ā’2 }. (5.35)

Ļ2 T

When we observe the returns or the closing prices C of survivors only, the results

have been ļ¬ltered with probability (5.35). In other words if the probability

density function of C without any barriers at all is f (c) (in our case this is a

lognormal density with parameters that depend on Āµ and Ļ) then the density

function of C of the survivors in the presence of a lower barrier is proportional

to

ln(O/l) ln(c/l)

f (c)[1 ā’ exp{ā’2 }]

Ļ2 T

l 2 ln(O/l) 2a

= f (c)(1 ā’ ( )Ī» ), with Ī» = = 2 > 0.

2T

c Ļ ĻT

It is interesting to note the eļ¬ect of this adjustment on the moments of C for

various values of the parameters. For example consider the expected value of C

conditional on survival

Rā l

cf (c)(1 ā’ ( c )Ī» )dc

l

E(C|L ā„ l] = R ā l

f (c)(1 ā’ ( c )Ī» )dc

l

E[CI(C ā„ l)] ā’ lĪ» E[C 1ā’Ī» I(C ā„ l)]

(5.36)

=

P [C ā„ l] ā’ lĪ» E[C ā’Ī» I(C ā„ l)]

and this is easy to evaluate in the case of interest in which C has a lognormal

distribution. In fact the same kind of calculation is used in the development of

the Black-Scholes formula. In our case C = exp(Z) where Z is N (ĀµT, Ļ 2 T )

294 CHAPTER 5. SIMULATING THE VALUE OF OPTIONS

and so for any p and l > 0, we have from (3.11), using the fact that E(C|O) =

O exp{ĀµT + Ļ2 T /2}, (and assuming O is ļ¬xed),

ā

1

E[C p I(C > l)] = Op exp{pĀµT + p2 Ļ 2 T /2}Ī¦( ā (a + ĀµT ) + Ļ T p)

ĻT

To keep things slightly less combersome, let us assume that we observe the

geometric Brownian motion for a period of T = 1. Then (5.36) results in

2 2 2

1 1

OeĀµ+Ļ /2

Ī¦( Ļ (a + Āµ) + Ļ) ā’ Oeā’aĪ»+(1ā’Ī»)Āµ+(1ā’Ī») Ļ /2 Ī¦( Ļ (a + Āµ) + Ļ(1 ā’ Ī»))

1 1

Ī¦( Ļ (a + Āµ)) ā’ eā’Ī»aā’Ī»Āµ+Ī»2 Ļ2 /2 Ī¦( Ļ (a + Āµ) ā’ ĻĪ»)

Let there be no bones about it. At ļ¬rst blush this is still a truly ugly and

opaque formula. We can attempt to beautify it by re-expressing it in terms more

like those in the Black-Scholes formula, putting

1 1

(Āµ ā’ a), and d2 (0) = (a + Āµ),

d2 (Ī») =

Ļ Ļ

d1 (Ī») = d2 (Ī») + Ļ, d1 (0) = d2 (0) + Ļ.

These are analogous to the values of d1, d2 in the Black-Scholes formula in the

case Ī» = 0. Then

2 2 2

eĀµ+Ļ /2

Ī¦(d1 (0)) ā’ eā’Ī»a+(1ā’Ī»)Āµ+(1ā’Ī») Ļ /2 Ī¦(d1 (Ī»))

E[C|L ā„ l] = O (5.37)

.

Ī¦(d2 (0)) ā’ eā’Ī»aā’Ī»Āµ+Ī»2 Ļ2 /2 Ī¦(d2 (Ī»))

What is interesting is how this conditional expectation, the expected close for

the survivors, behaves as a function of the volatility parameter Ļ. Although this

is a rather complicated looking formula, we can get a simpler picture (Figure

5.8) using a graph with the drift parameter Āµ chosen so that E(C) = 56.25 is

held ļ¬xed. We assume a = ā’ ln(30/40) (consistent with Table 5.2)and vary

the value of Ļ over a reasonable range from Ļ = 0.1 (a very stable investment)

through Ļ = .8 (a highly volatile investment). In Figure 5.8 notice that for small

volatility, e.g. for Ļ Ā· 0.2, the conditional expectation E[C|L ā„ 30] remains

close to its unconditional value E(C) but for Ļ ā„ 0.3 it increases almost linearly

in Ļ to around 100 for Ļ = 0.8. The intuitive reason for this dramatic increase is

quite simple. For large values of Ļ the process ļ¬‚uctuates more, and only those

SURVIVORSHIP BIAS 295

Figure 5.8: E[C|L ā„ 30] for various values of (Āµ, Ļ) chosen such that E(C) =

56.25.

paths with very large values of C have abeen able to avoid the absorbing barrier

at l = 30. Two comparable portfolios with unconditional return about 40%

will show radically diļ¬erent apparent returns in the presence of an absorbing

barrier. If Ļ = 20% then the survivorā™s return will still average around 40%,

but if Ļ = 0.8, the survivorā™s returns average close to 150%. The practical

implications are compelling. If there is any form of survivorship bias (as there

usually is), no measure of performance should be applied to the returns from

diļ¬erent investments, managers, or portfolios without an adjustment for the

risk or volatility.

In the light of this discussion we can return to the comparison of the two

portfolios in Table 5.3. Evidently there is little bias in the estimate of returns

for portfolio 2, since in this case the volatility is small Ļ = 0.2. However there

is very substantial bias associated with the estimate for portfolio 1, Ļ = 0.5.

In fact if we repeat the graph of Figure 5.8 assuming that the unconditional

return is around 8% we discover that E[C|L ā„ 30] is very close to 56 5 when

8

Ļ = 0.5 indicating that this is a more reasonable estimator of the performance

of portfolio 1.

296 CHAPTER 5. SIMULATING THE VALUE OF OPTIONS

Figure 5.9: The Eļ¬ect of Surivorship bias for a Brownian Motion

For a Brownian motion process it is easy to demonstrate graphically the

nature of the surivorship bias. In Figure 5.9, the points under the graph of

the probability density which are shaded correspond to those which whose low

fell below the absorbing barrier at l = 30. The points in the unshaded region

correspond to the survivors. The expected value of the return conditional on

survival is the mean return (x-cooredinate of the center of mass) of those points

chosen uniformly under the density but above the lower curve, in the region

labelled āsurvivorsā. Note that if the mean Āµ of the unconditional density

approaches the barrier (here at 30) , this region approaches a narrow band

along the top of the curve and to the right of 30. Similarly if the unconditional

standard deviation or volatility increases, the unshaded region stretches out to

the right in a narrow band and the conditional mean increases.

We arrive at the following seemingly paradoxical conclusions which make it

imperative to adjust for survivorship bias: the conditional mean, conditional on

survivorship, may increase as the volatility increases even if the unconditional

SURVIVORSHIP BIAS 297

mean decreases.

Let us return to the problem with both an upper and lower barrier and

consider the distribution of returns conditional on the low never passing a barrier

Oeā’a and the high never crossing a barrier at Oeb ( representing a fund buyout,

recruitment of manager by competitor or promotion of fund manager to Vice

President). It is common in process control to have an upper and lower barrier

and to intervene if either is crossed, so we might wish to study those processes

for which no intervention was required. Similarly, in a retrospective study we

may only be able to determine the trajectory of a particle which has not left

a given region and been lost to us. Again as an example, we use the following

data on two portfolio managers, both observed conditional on survival, for a

period of one year.

Portfolio Open price Close Price Lower Barrier Upper Barrier Volatility

56 5

1 40 30 100 .5

8

56 1

2 40 30 100 .2

4

If Ļ denotes the standard normal p.d.f., then the conditional probability

density function of ln(C/O) given that Oeā’a < L < H < Oeb is proportional

uā’Āµ

1

to where, as before

Ļ Ļ( Ļ )w(u)

2 2 2 2

w(u) = 1 ā’ eā’2b(bā’u)/Ļ + eā’2(a+b)(a+bā’u)/Ļ ā’ eā’2a(a+u)/Ļ + eā’2(a+b)(a+b+u)/Ļ ā’ E(W ),

ā’ ln(L)

ln(H) b a

where W = I[f rac1( ], and

)> ] + I[f rac1( )>

a+b a+b a+b a+b

a = ā’ln(30/40).

b = ln(100/40),

The expected return conditional on survival when the drift is Āµ is given by

Z b

uā’Āµ

1

E(ln(C/O)|30 < L < H < 100) = uw(u)Ļ( )du.

Ļ Ļ

ā’a

298 CHAPTER 5. SIMULATING THE VALUE OF OPTIONS

where w(u) is the weight function above. Therefore a moment estimator of the

drift for the two portfolios is determined by setting this expected return equal

to the observed return, and solving for Āµi the equation

Z

1b u ā’ Āµi

uw(u)Ļ( )du = Ri , i = 1, 2.

Ļi ā’a Ļi

The solution is, for portfolio 1, Āµ1 = 0 and for portfolio 2, Āµ2 = 0.3. Thus the

observed values of C are completely consistent with a drift of 30% per annum

for portfolio 2 and a zero drift for portfolio 1. The bias again very strongly

eļ¬ects the portfolio with the greater volatility and estimators of drift should

account for this substantial bias. Ignoring the survivorship bias has led in the

past to some highly misleading conclusions about persistence of skill among

mutual funds.

Problems

1. If the values of dj are equally spaced, i.e. if dj = jā, j = ..., ā’2, ā’1, 0, 1, ...and

with S0 = 0, ST = C and M = max(S0 , ST ), show that

P [C > u and Cā’M is even]

ā

E[H|C = u] = M + ā .

P [C = u]

2. Let W (t) be a standard Brownian motion on [0, 1] with W0 = 0. Deļ¬ne

C = W (1) and H = max{W (t); 0 Ā· t Ā· 1}. Show that the joint probabil-

ity density function of (C, H) is given by

f (c, h) = 2Ļ(c)(2h ā’ c)eā’2h(hā’c) , for h > max(0, c)

where Ļ(c) is the standard normal probability density function.

3. Use the results of Problem 2 to show that the joint probability density

function of the random variables

Y = exp{ā’(2H ā’ C)2 /2}

and C is a uniform density on the region {(x, y); y < exp(x2 /2)}.

PROBLEMS 299

4. Let X(t) be a Brownian motion on [0, 1], i.e. Xt satisļ¬es

dXt = Āµdt + ĻdWt , and X0 = 0.

Deļ¬ne C = X(1) and H = max{X(t); 0 Ā· t Ā· 1}. Find the joint proba-

bility density function of (C, H).

300 CHAPTER 5. SIMULATING THE VALUE OF OPTIONS

Chapter 6

Quasi- Monte Carlo

Multiple Integration

Introduction

In some sense, this chapter ļ¬ts within Chapter 4 on variance reduction; in

some sense it is stratiļ¬cation run wild. Quasi-Monte Carlo methods are purely

deterministic, numerical analytic methods in the sense that they do not even

attempt to emulate the behaviour of independent uniform random variables, but

rather cover the space in d dimensions with fewer gaps than independent random

variables would normally admit. Although these methods are particularly when

evaluating integrals in moderate dimensions, we return brieļ¬‚y to the problem

of evaluating a one-dimension integral of the form

Z 1

f (x)dx.

0

The simplest numerical approximation to this integral consists of choosing a

point xj in the interval [ N , j+1 ], j = 0, 1, ..., N ā’ 1, perhaps the midpoint of the

j

N

301

302 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

interval, and then evaluating the average

N ā’1

1X

(6.1)

f (xj ).

N j=0

If the function f has one continuous derivative, such a numerical method with

N equally or approximately equally spaced points will have bias that approaches

0 at the rate 1/N because, putting M = sup{|f 0 (z)|; 0 < z < 1},

Z (j+1)/N

1 1

f (x)dx ā’ f (xj ) Ā· 2 M (6.2)

N N

j/N

and so summing both sides over j gives

Z Nā’1

1X

1

1

| f (x)dx ā’ f (xj )| Ā· M.

N j=0 N

0

We will refer to the error in the numerical integral in this case

Z N ā’1

1X

1

ĪµN = | f (x)dx ā’ f (xj )|

N j=0

0

as O(N ā’1 ) which means that the sequence of errors ĪµN satisļ¬es

lim sup N ā’1 ĪµN < ā

N ā’ā

or intuitively that the errors are bounded by a constant times N ā’1 .

If the function f is known to have bounded derivatives of second or third

order, then integrals can be approximated to an even higher degree of precision.

For example various numerical quadrature formulae permit approximating an

R1

integral of the form 0 f (x)w(x)dx with a weighted average of N points

N

X

(6.3)

wj f (xj )

j=1

in such a way that if f (x) is a polynomial of degree 2N ā’ 1 or less, the approx-

imation is exact. Here the function w(x) is typically some density such as the

uniform, exponential or normal density and the optimal placement of the points

xj as well as the weights wj depends on w(x). Of course a smooth function

INTRODUCTION 303

can be closely approximated with a polynomial of high degree and so numerical

quadrature formulae of the form (6.3) permit approximating a one-dimension

integral arbitrarily closely provided that the function is suļ¬ciently smooth, i.e.

it has bounded derivatives of suļ¬ciently high order. We should note that in

this case, the weights wj and the points xj are both deterministic. By contrast,

the Monte Carlo integral

N

1X

b

ĪøMC = f (Ui )

N i=1

with N points places these points at random or pseudo-random locations, has

q

b

zero bias but the standard deviation of the estimator var(ĪøMC ) is a constant

ā

multiple of 1/ N . The Central Limit theorem assures us that

Z1

b

N 1/2 (ĪøMC ā’ f (x)dx)

0

converges to a normal distribution which means that the error is order (in prob-

ability) N ā’1/2 . Note that there is a change in our measure of the size of an

error, since only the variance or standard deviation of a given term in the se-

quence of errors is bounded, not the whole sequence of errors ĪµN . In particular

b

if a pseudo-random estimator Īø satisļ¬es

Z1

b f (x)dx)2 = O(N ā’2k )

E(Īø ā’

0

then we say that the error is OP (N ā’k ) where OP denotes āorder in probabilityā.

This is clearly a weaker notion than O(N ā’k ). Even the simplest numerical

integral (6.1) has a faster rate of convergence then that of the Monte Carlo

integral with or without use of the variance reduction techniques of Chapter 4.

This is a large part of the reason numerical integration is usually preferred to

Monte Carlo methods in one dimension, at least for smooth functions, but it

also indicates that for regular integrands, there is room for improvement over

Monte Carlo in higher dimensions as well.

The situation changes in 2 dimensions. Suppose we wish to distribute N

points over a uniform lattice in some region such as the unit square. One

304 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

possible placement is to points of the form

ā

i j

( ā , ā ), i, j = 1, 2, ... N

N N

ā

assuming for convenience of notation that N is integer. The distance between

ā

adjacent points is of order 1/ N and by an argument akin to (6.2), the bias in

ā

a numerical integral is order 1/ N . This is the now same order as the standard

deviation of a Monte Carlo integral, indicating that the latter is already, in two

dimensions, competitive. When the dimension s ā„ 3, a similar calculation shows

that the standard deviation of the Monte-Carlo method is strictly smaller order

than the error of a numerical integral with weights at lattice points. Essentially,

the placement of points on a lattice for evaluating a dā’dimensional integral is far

from optimal when d ā„ 2. Indeed various deterministic alternatives called quasi-

random samples provide substantially better estimators especially for smooth

functions of several variables. Quasi-random samples are analogous to equally

spaced points in one dimension and are discussed at length by Niederreiter

(1978), where it is shown that for suļ¬ciently smooth functions, one can achieve

rates of convergence close to the rate 1/N for the one-dimensional case.

We have seen a number of methods designed to reduce the dimensionality

of the problem. Perhaps the most important of these is conditioning, which can

reduce an dā’dimensional integral to a one-dimensional one. In the multidimen-

sional case, variance reduction has an increased importance because of the high

variability induced by the dimensionality of crude methods. The other vari-

ance reduction techniques such as regression and stratiļ¬cation carry over to the

multivariable problem with little change, except for the increased complexity of

determining a reasonable stratiļ¬cation in such problems.

Errors in numerical Integration

We consider the problem of numerical integration in d dimensions. For d = 1

classical integration methods, like the trapezoidal rule, are weighted averages of

INTRODUCTION 305

the value of the function at equally spaced points;

Z m

X

1

n

f (u)du ā (6.4)

wn f ( ),

m

0 n=0

where w0 = wm = 1/(2m), and wn = 1/m for 1 Ā· n Ā· m ā’ 1. The trape-

zoidal rule is exact for any function that is linear (or piecewise linear between

grid-points) and so we can assess the error of integration by using a linear ap-

proximation through the points ( m , f ( m )) and ( j+1 , f ( j+1 )). Assume

j j

m m

j j+1

<x< .

m m

If the function has a continuous second derivative, we have by Taylorā™s Theorem

that the diļ¬erence between the function and its linear interpolant is of order

j2

O(x ā’ i.e.

m) ,

j j j+1 j j

) ā’ f ( )] + O(x ā’ )2 .

) + (x ā’ )m[f (

f (x) = f (

m m m m m

j j+1

Integrating both sides between and notice that

m,

m

Z

f ( j+1 ) + f ( m )

j

(j+1)/m

j j j+1 j m

{f ( ) + (x ā’ )m[f ( ) ā’ f ( )]}dx =

m m m m 2m

j/m

is the area of the trapezoid and the error in the approximation is

Z (j+1)/m

j2

) ) = O(mā’3 ).

(x ā’

O(

m

j/m

Adding these errors of approximation over the m trapezoids gives O(mā’2 ). Con-

sequently, the error in the trapezoidal rule approximation is O(mā’2 ), provided

that f has a continuous second derivative on [0, 1].

We now consider the multidimensional case, d ā„ 2. Suppose we evaluate

the function at all of the (m + 1)d points of the form ( n1 , . . . , ns ) and use this

m m

to approximate the integral. The classical numerical integration methods use a

Cartesian product of one-dimensional integration rules. For example, the d-fold

Cartesian product of the trapezoidal rule is

306 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Z m m

X X n1 ns

f (u)du ā Ā·Ā·Ā· wn1 Ā· Ā· Ā· wns f ( (6.5)

, . . . , ),

m m

[0,1]d n1 =0 ns =0

where [0, 1]d is the closed s-dimensional unit cube and the wn are as before.

The total number of nodes is N = (m + 1)s . From the previous error bound it

follows that the error is O(mā’2 ), provided that the second partial derivatives of

f are continuous on [0, 1]d . We know that the error cannot be smaller because

when the function depends on only one variable and is constant in the others,

the one-dimensional result is a special case. In terms of the number N of nodes

or function evaluations, since m = O(N 1/d ), the error is O(N ā’2/d ), which,

with increasing dimension d, changes dramatically. For example if we required

N = 100 nodes to achieve a required precision in the case d = 1, to achieve the

same precision for a d = 5 dimensional integral using this approach we would

need to evaluate the function at a total of 100d = 1010 = ten billion nodes. As

the dimension increases, the number of function evaluations or computation

required for a ļ¬xed precision increases exponentially. This phenomena is often

called the ācurse of dimensionalityā, exorcised in part at least by quasi or regular

Monte Carlo methods.

The ordinary Monte Carlo method based on simple random sampling is free

of the curse of dimensionality. By the central limit theorem, even a crude Monte

Carlo estimate for numerical integration yields a probabilistic error bound of the

form OP (N ā’1/2 ) in terms of the number N of nodes (or function evaluations)

and this holds under a very weak regularity condition on the function f . The

remarkable feature here is that this order of magnitude does not depend on the

dimension d. This is true even if the integration domain is complicated. Note

however that the deļ¬nition of ā Oā has changed from one that essentially con-

siders the worst case scenario to OP which measures the average or probabilistic

behaviour of the error.

Some of the oft-cited deļ¬ciencies of the Monte Carlo method limiting its

THEORY OF LOW DISCREPANCY SEQUENCES 307

usefulness are:

1. There are only probabilistic error bounds (there is no guarantee that the

expected accuracy is achieved in a particular case -an alternative approach

would optimize the āworst-caseā behaviour);

2. Regularity of the integrand is not exploited even when it is available. The

probabilistic error bound OP (N ā’1/2 ) holds under a very weak regularity

condition but no extra beneļ¬t is derived from any additional regularity or

smoothness of the integrand. For example the estimator is no more precise

if we know that the function f has several continuous derivatives. In cases

when we do not know whether the integrand is smooth or diļ¬erentiable,

it may be preferable to use Monte Carlo since it performs reasonably well

without this assumption.

3. Genuine Monte Carlo is not feasible anyway since generating truly in-

dependent random numbers is virtually impossible. In practice we use

pseudo-random numbers to approximate independence.

Theory of Low discrepancy sequences

The quasi-Monte Carlo method places attention on the objective, approximating

an integral, rather than attempting to imitate the behaviour of independent

uniform random variates. Quasi-random sequences of low discrepancy sequences

would fail all of the tests applied to a pseudo-random number generate except

those testing for uniformity of the marginal distribution because the sequence

is, by construction, autocorrelated. Our objective is to approximate an integral

using a average of the function at N points, and we may adjust the points so that

the approximation is more accurate. Ideally we would prefer these sequences to

be self-avoiding, so that as the sequence is generated, holes are ļ¬lled. As usual

308 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

we will approximate the integral with an average;

Z N

1X

f (u)du ā (6.6)

f (xn ).

N n=1

d

[0,1]

Quasi Monte-Carlo is able to achieve a deterministic error bound O((logN )d /N )

for suitably chosen sets of nodes and for integrands with a relatively low degree

of regularity, much better than the rate O(N ā’1/2 ) achieved by Monte Carlo

methods. Even smaller error bounds can be achieved for suļ¬ciently regular

integrands. There are several algorithms or quasi-Monte-Carlo sequences which

give rise to this level of accuracy.

Suppose, as with a crude Monte Carlo estimate, we approximate the integral

with (6.6) with x1 , . . . , xN ā [0, 1]d . The sequence x1 , . . . , xN ,... is determinis-

tic (as indeed are the pseudo-random sequences we used for Crude Monte-Carlo),

but they are now chosen so as to guarantee a small error. Points are chosen so

as to achieve the maximal degree of uniformity or a low degree of discrepancy

with a uniform distribution. A ļ¬rst requirement for a low discrepancy sequence

is that we obtain convergence of the sequence of averages so that:

Z

N

1X

lim f (xn ) = f (u)du,

Nā’ā N [0,1]d

n=1

and this should hold for a reasonably large class of integrands. This suggests

that the most desirable sequences of nodes x1 , . . . , xN are āevenly distributedā

over [0, 1]d . Various notions of discrepancy have been considered as quantitative

measures for the deviation from the uniform distribution but we will introduce

only one here, the so-called āstar-discrepancyā. The star discrepancy is perhaps

the more natural one in statistics, since it measures the maximum diļ¬erence be-

tween the empirical cumulative distribution function of the points {x1 , . . . , xN }

and the uniform distribution of measure on the unit cube. Suppose we construct

N

X

bN (x) = 1 I(xn Ā· x),

F

N n=1

THEORY OF LOW DISCREPANCY SEQUENCES 309

the empirical cumulative distribution function of the points x1 , . . . , xN , and

compare it with

F (x) =F (x1 , ...xd ) = min(1, x1 x2 ...xd ) if all xi ā„ 0

the theoretical uniform distribution on [0, 1]d . While any measure of the dif-

ference could be used, the star discrepancy is simply the Kolmogorov-Smirnov

distance between these two cumulative distribution functions

# of points in B

b

ā—

DN = sup | FN (x) ā’ F (x)| = sup | ā’ Ī»(B)|,

N

x B

where the supremum is taken over all rectangles B of the form [0, x1 ] Ć— [0, x2 ] Ć—

... Ć— [0, xd ] and where Ī»(B) denotes the Lebesgue measure of B in Rd .

It makes intuitive sense that we should choose points {x1 , . . . , xN } such

that the discrepancy is small for each N. This intuition is supported by a

large number of theoretical results, at least in the case of smooth integrands

with smooth partial derivatives. The smoothness is measured using V (f ), a

ātotal variationā in the sense of Hardy and Krause, intuitively the length of the

monotone segments of f. For a one dimensional function with a continuous ļ¬rst

derivative it is simply

Z 1

|f 0 (x)|dx.

V (f ) =

0

In higher dimensions, the Hardy Krause variation may be deļ¬ned in terms of

the integral of partial derivatives;

Deļ¬nition 48 Hardy and Krause Total Variation

If f is suļ¬ciently diļ¬erentiable then the variation of f on [0, 1]d in the sense

of Hardy and Krause is

s

X X

V (k) (f ; i1 , . . . , ik ), (6.7)

V (f ) =

k=1 1 i1 <Ā·Ā·Ā·<ik s

where

ĀÆ ĀÆ

Z Z

ĀÆ ĀÆ

1 1

ā‚sf

ĀÆ ĀÆ

(k)

Ā·Ā·Ā· dxi1 Ā· Ā· Ā· dxik . (6.8)

V (f ; i1 , . . . , ik ) = ĀÆ ā‚xi Ā· Ā· Ā· ā‚xi ĀÆ

0 0 k xj =1,j6=i1 ,...,ik

1

310 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

The precision in our approximation to an integral as an average of function

values is closely related to the discrepancy measure as the following result shows.

Indeed the mean of the function values diļ¬ers from the integral of the function

by an error which is bounded by the product of the discrepancy of the sequence

and the measure V (f ) of smoothness of the function.

Theorem 49 (Koksma-Hlawka inequality)

If f has bounded variation V (f ) on [0, 1]d in the sense of Hardy and Krause,

then, for any x1 , . . . , xN ā [0, 1]d , we have

Z

N

1X ā—

| f (xn ) ā’ f (u)du| Ā· V (f )DN . (6.9)

N n=1 s

I

We do not normally use this inequality as it stands since the evaluation of the

error bound on the right hand side requires determining V (f ), typically a very

diļ¬cult task. However this bound allows a separation between the regularity

properties of the integrand and the degree of uniformity of the sequence. We can

guarantee a reasonable approximation for any function f with bounded total

ā—

variation V (f ) by ensuring that the discrepancy of the sequence DN is small.

For this reason, the discrepancy is central to quasi-Monte Carlo integration.

Sequences with small star discrepancy are called low-discrepancy sequences. In

fact since a variety of sequences exist with discrepancy of order

(log N )d

N

as N ā’ ā, the term ālow-discrepancyā is often reserved for these.

Examples of low discrepancy sequences

Van der Corput Sequence.

In the one dimensional case the best rate of convergence is O(N ā’1 log N ), N ā„ 2.

It is achieved, for example, by the van der Corput sequence, obtained by

EXAMPLES OF LOW DISCREPANCY SEQUENCES 311

reversing the digits in the representation of some sequence of integers in a given

base. Consider one-dimensional case d = 1 and base b = 2. Take the base b

representation of the sequence of natural numbers;

1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011, 1100, 1101, ...

P

and then map these into the unit interval [0, 1] so that the integer t ak bk

k=0

Pt

is mapped into the point k=0 ak bā’kā’1 . These binary digits are mapped into

(0,1) in the following three steps;

1. Write n using its binary expansion. e.g. 13 = 1(8) + 1(4) + 0(2) + 1(1)

becomes 1101.

2. Reverse the order of the digits. e.g. 1101 becomes 1011.

3. Determine the number that this is the binary decimal expansion for. e.g.

1011 = 1( 1 ) + 0( 1 ) + 1( 1 ) + 1( 16 ) =

1 11

16 .

2 4 8

Thus 1 generates 1/2, 10 generates 0( 1 ) + 1( 1 ), 11 generates 1( 1 ) + 1( 1 )

2 4 2 4

and the sequence of positive integers generates the points. The intervals are

recursively split in half in the sequence 1/2, 1/4, 3/4, 1/8, 5/8, 3/8, 7/8, ... and

the points are fairly evenly spaced for any value for the number of nodes N ,

and perfectly spaced if N is of the form 2k ā’ 1. The star discrepancy of this

sequence is

log N

ā—

DN = O( )

N

which matches the best that is attained for inļ¬nite sequences.

The Halton Sequence

This is simply the multivariate extension of the Van der Corput sequence. In

higher dimensions, say in d dimensions, we choose d distinct primes, b1 , b2 , ...bd

(usually the smallest primes) and generate, from the same integer m , the d

components of the vector using the method described for the Van der Corput

312 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

sequence. For example, we consider the case d = 3 and use bases b1 = 2,

b2 = 3, b3 = 5 because these are the smallest three prime numbers. The ļ¬rst

few vectors , ( 1 , 1 , 1 ), ( 1 , 2 , 2 ), ( 3 , 1 , 3 ), ...are generated in the table below.

235 435 495

repres ļ¬rst repres. second repres third

m

base 2 component base 3 comp base 5 comp

1 1 1/2 1 1/3 1 1/5

2 10 1/4 2 2/3 2 2/5

3 11 3/4 10 1/9 3 3/5

4 100 1/8 11 4/9 4 4/5

5 101 5/8 12 7/9 10 1/25

6 110 3/8 20 2/9 11 6/25

7 111 7/8 21 5/9 12 11/25

9 1000 1/16 22 8/9 13 16/25

10 1001 9/16 100 1/27 14 21/25

Figure 6.1 provides a plot of the ļ¬rst 500 points in the above Halton sequence

of dimension 3.

There appears to be greater uniformity than a sequence of random points

would have. Some patterns are discernible on the two dimensional plot of the

ļ¬rst 100 points, for example see Figures 6.2 and 6.3.

These ļ¬gures can be compared with the plot of 100 pairs of independent

uniform random numbers in Figure 6.4, which seems to show more clustering

and more holes in the point cloud.

These points were generated with the following function for producing the

Halton sequence.

function x=halton(n,s)

%x has dimension n by s and is the ļ¬rst n terms of the halton sequence of

%dimension s.

p=primes(s*6); p=p(1:s); x=[];

EXAMPLES OF LOW DISCREPANCY SEQUENCES 313

1

0.8

0.6

0.4

0.2

0

1

0.8 1

0.6 0.8

0.6

0.4

0.4

0.2

0.2

0 0

Figure 6.1: 500 points from a Halton seqnece of dimension 3

for i=1:s

x=[x (corput(n,p(i)))ā™];

end

function x=corput(n,b)

% converts integers 1:n to from van der corput number with base b

m=ļ¬‚oor(log(n)/log(b));

n=1:n; A=[];

for i=0:m

a=rem(n,b); n=(n-a)/b;

A=[A ;a];

end

x=((1./bā™).^(1:(m+1)))*A;

The Halton sequence is a genuine low discrepancy sequence in the sense that

(log N )d

ā—

DN = O( )

N

314 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Halton sequence of dimension 3

1

0.9

0.8

0.7

0.6

second coordinate

0.5

0.4

0.3

0.2

0.1

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

first coordinate

Figure 6.2: The ļ¬rst and second coordinate of 100 points from the Halton

sequence of dimension 3

and the coverage of the unit cube is reasonably uniform for small dimensions.

Unfortunately the notation O() hides a constant multiple, one which, in this

case, depends on the dimension d. Roughly (Niedereiter, 1992), this constant

is asymptotic to dd which grows extremely fast in d. This is one indicator that

for large d, the uniformity of the points degrades rapidly, largely because the

relative sparseness of the primes means that the d0 th prime is very large for d

large. This results in larger holes or gaps in that component of the vector than

we would like. This is evident for example in Figure6.5 where we plot the last

two coordinates of the Halton sequence of dimension 15.

The performance of the Halton sequence is considerably enhanced by per-

muting the coeļ¬cients ak prior to mapping into the unit interval as is done by

the Faure sequence.

EXAMPLES OF LOW DISCREPANCY SEQUENCES 315

Figure 6.3: The second and third coordinate of 100 points from the Halton

sequence of dimension 3

Faure Sequence

The Faure sequence is similar to the Halton sequence in that each dimension is

a permutation of a van der Corput sequence; however, the same prime is used

as the base b for each of the components of the vector, and is usually chosen to

be the smallest prime greater than or equal to the dimension (Fox, 1996).

In the Van der Corput sequence we wrote the natural numbers in the form

Pt P

ak bk which was then mapped into the point t ak bā’kā’1 in the unit

k=0 k=0

interval. For the Faure sequence we use the same construction but we use

diļ¬erent permutations of the coeļ¬cients ak for each of the coordinates. In

particular in order to generate the iā™th coordinate we generate the point

t

X

ck bā’kā’1

k=0

316 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Figure 6.4: 100 independent U [0, 1] pairs

where

X ĀµmĀ¶

t

(i ā’ 1)mā’k am mod b

ck =

k

m=k

Notice that only the last t ā’ k + 1 values of ai are used to generate ck . For

example consider the case d = 2, b = 2. Then the ļ¬rst 10 Faure numbers are

0 1/2 1/4 3/4 1/8 5/8 3/8 7/8 1/16 9/16

0 1/2 3/4 1/4 5/8 1/8 3/8 7/8 15/16 7/16

The ļ¬rst row corresponds to the Van der Corput numbers and the second row

of obtained from the ļ¬rst by permuting the values with the same denominator.

The Faure sequence has better regularity properties than does the Halton

sequence above particularly in high dimensions. However the diļ¬erences are by

no means evident from a graph when the dimension is moderate. For example

we plot in Figure 6.6 the 14ā™th and 15ā™th coordinates of 1000 points from the

Faure sequence of dimension d = 15 for comparison with Figure 6.5.

EXAMPLES OF LOW DISCREPANCY SEQUENCES 317

Figure 6.5: The 14ā™th and 15ā™th coordinates of the ļ¬rst 1000 of a Halton sequence

d = 15

Other suggestions for permuting the digits in a Halton sequence include

using only every l0 th term in the sequence so as to destroy the cycle.

In practice, in order to determine the eļ¬ect of using one of these low dis-

crepancy sequences we need only substitute such a sequence for the vector of

independent uniform random numbers used by a simulation. For example if we

wished to simulate a process for 10 time periods, then value a call option and

average the results, we could replace the 10 independent uniform random num-

bers that we used to generate one path by an element of the Halton sequence

with d = 10.

Suppose we return brieļ¬‚y to the call option example treated in Chapter 3.

The true value of this call option was around 0.4615 according to the Black-

Scholes formula. If however we substitute the Van der Corput sequence for the

sequence of uniform random numbers,

318 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Figure 6.6: The last two coordinates of the ļ¬rst 1000 Faure points of dimension

d = 15.

mean(fn(corput(100000,2)))

we obtain an estimate of 0.4614 very close to the correct value. I cannot

compare these estimators using the notion of eļ¬ciency that we used there, how-

ever, because these low-discrepancy sequences are not random and do not even

attempt to emulate random numbers. Though unable to compare performance

with the variance of an estimator, we can look at the Mean squared error (see

for example Figure 6.8). which shows a faster rate of convergence for Quasi

Monte Carlo equivalent to variance reduction in excess of 100). Galanti & Jung

(1997), report that the Faure sequence suļ¬ers from the problem of start-up

EXAMPLES OF LOW DISCREPANCY SEQUENCES 319

and especially in high-dimensions and the Faure numbers can exhibit clustering

about zero. In order to reduce this problem, Faure suggests discarding the ļ¬rst

b4 ā’ 1 points.

Sobol Sequence

The Sobol sequence is generated using a set of so-called direction numbers

mi

= 1, 2, where the mi are odd positive integers less than 2i . The

vi = 2i , i

values of mi are chosen to satisfy a recurrence relation using the coeļ¬cients of

a primitive polynomial in the Galois Field of order 2. A primitive polynomial is

irreducible (i.e. cannot be factored into polynomials of smaller degree) and does

not divide the polynomial xr + 1 for r < 2p ā’ 1. For example the polynomial

x2 + x + 1 has no non-trival factors over the Galois Field of order 2 and it does

divide x3 + 1 but not xr + 1 for r < 3. Corresponding to a primitive polynomial

z p + c1 z pā’1 + ...cpā’1 z + cp

is the recursion

mi = 2c1 miā’1 + 22 c2 miā’2 + ... + 2p cp miā’p

where the addition is carried out using binary arithmetic. For the Sobol se-

quence, we then replace the binary digit ak by ak vk .

In the case d = 2, the ļ¬rst 10 Sobol numbers are, using irreducible polyno-

mials x + 1 and x3 + x + 1

0 1/2 1/4 3/4 3/8 7/8 1/8 5/8 5/16 13/16

0 1/2 1/4 3/4 1/8 5/8 3/8 7/8 11/16 3/16

Again we plot the last two coordinates for the ļ¬rst 1000 points from a Sobol

sequence of dimension d = 15 in Figure 6.7 for comparison with Figures 6.5 and

6.6.

320 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Figure 6.7: The last two coordinates of the ļ¬rst 1000 points from a Sobel se-

quence of dimension 15

Although there is a great deal of literature espousing the use of one quasi-

Monte Carlo sequence over another, most results from a particular application

ńņš. 10 |