Quasi- Monte Carlo

Multiple Integration

Introduction

In some sense, this chapter ļ¬ts within Chapter 4 on variance reduction; in

some sense it is stratiļ¬cation run wild. Quasi-Monte Carlo methods are purely

deterministic, numerical analytic methods in the sense that they do not even

attempt to emulate the behaviour of independent uniform random variables, but

rather cover the space in d dimensions with fewer gaps than independent random

variables would normally admit. Although these methods are particularly when

evaluating integrals in moderate dimensions, we return brieļ¬‚y to the problem

of evaluating a one-dimension integral of the form

Z 1

f (x)dx.

0

The simplest numerical approximation to this integral consists of choosing a

point xj in the interval [ N , j+1 ], j = 0, 1, ..., N ā’ 1, perhaps the midpoint of the

j

N

311

312 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

interval, and then evaluating the average

N ā’1

1X

(6.1)

f (xj ).

N j=0

If the function f has one continuous derivative, such a numerical method with

N equally or approximately equally spaced points will have bias that approaches

0 at the rate 1/N because, putting M = sup{|f 0 (z)|; 0 < z < 1},

Z (j+1)/N

1 1

f (x)dx ā’ f (xj ) Ā· 2 M (6.2)

N N

j/N

and so summing both sides over j gives

Z Nā’1

1X

1

1

| f (x)dx ā’ f (xj )| Ā· M.

N j=0 N

0

We will refer to the error in the numerical integral in this case

Z N ā’1

1X

1

ĪµN = | f (x)dx ā’ f (xj )|

N j=0

0

as O(N ā’1 ) which means that the sequence of errors ĪµN satisļ¬es

lim sup N ā’1 ĪµN < ā

N ā’ā

or intuitively that the errors are bounded by a constant times N ā’1 .

If the function f is known to have bounded derivatives of second or third

order, then integrals can be approximated to an even higher degree of precision.

For example various numerical quadrature formulae permit approximating an

R1

integral of the form 0 f (x)w(x)dx with a weighted average of N points

N

X

(6.3)

wj f (xj )

j=1

in such a way that if f (x) is a polynomial of degree 2N ā’ 1 or less, the approx-

imation is exact. Here the function w(x) is typically some density such as the

uniform, exponential or normal density and the optimal placement of the points

xj as well as the weights wj depends on w(x). Of course a smooth function

INTRODUCTION 313

can be closely approximated with a polynomial of high degree and so numerical

quadrature formulae of the form (6.3) permit approximating a one-dimension

integral arbitrarily closely provided that the function is suļ¬ciently smooth, i.e.

it has bounded derivatives of suļ¬ciently high order. We should note that in

this case, the weights wj and the points xj are both deterministic. By contrast,

the Monte Carlo integral

N

1X

b

ĪøMC = f (Ui )

N i=1

with N points places these points at random or pseudo-random locations, has

q

b

zero bias but the standard deviation of the estimator var(ĪøMC ) is a constant

ā

multiple of 1/ N . The Central Limit theorem assures us that

Z1

b

N 1/2 (ĪøMC ā’ f (x)dx)

0

converges to a normal distribution which means that the error is order (in prob-

ability) N ā’1/2 . Note that there is a change in our measure of the size of an

error, since only the variance or standard deviation of a given term in the se-

quence of errors is bounded, not the whole sequence of errors ĪµN . In particular

b

if a pseudo-random estimator Īø satisļ¬es

Z1

b f (x)dx)2 = O(N ā’2k )

E(Īø ā’

0

then we say that the error is OP (N ā’k ) where OP denotes āorder in probabilityā.

This is clearly a weaker notion than O(N ā’k ). Even the simplest numerical

integral (6.1) has a faster rate of convergence then that of the Monte Carlo

integral with or without use of the variance reduction techniques of Chapter 4.

This is a large part of the reason numerical integration is usually preferred to

Monte Carlo methods in one dimension, at least for smooth functions, but it

also indicates that for regular integrands, there is room for improvement over

Monte Carlo in higher dimensions as well.

The situation changes in 2 dimensions. Suppose we wish to distribute N

points over a uniform lattice in some region such as the unit square. One

314 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

possible placement is to points of the form

ā

i j

( ā , ā ), i, j = 1, 2, ... N

N N

ā

assuming for convenience of notation that N is integer. The distance between

ā

adjacent points is of order 1/ N and by an argument akin to (6.2), the bias in

ā

a numerical integral is order 1/ N . This is the now same order as the standard

deviation of a Monte Carlo integral, indicating that the latter is already, in two

dimensions, competitive. When the dimension s ā„ 3, a similar calculation shows

that the standard deviation of the Monte-Carlo method is strictly smaller order

than the error of a numerical integral with weights at lattice points. Essentially,

the placement of points on a lattice for evaluating a dā’dimensional integral is far

from optimal when d ā„ 2. Indeed various deterministic alternatives called quasi-

random samples provide substantially better estimators especially for smooth

functions of several variables. Quasi-random samples are analogous to equally

spaced points in one dimension and are discussed at length by Niederreiter

(1978), where it is shown that for suļ¬ciently smooth functions, one can achieve

rates of convergence close to the rate 1/N for the one-dimensional case.

We have seen a number of methods designed to reduce the dimensionality

of the problem. Perhaps the most important of these is conditioning, which can

reduce an dā’dimensional integral to a one-dimensional one. In the multidimen-

sional case, variance reduction has an increased importance because of the high

variability induced by the dimensionality of crude methods. The other vari-

ance reduction techniques such as regression and stratiļ¬cation carry over to the

multivariable problem with little change, except for the increased complexity of

determining a reasonable stratiļ¬cation in such problems.

Errors in numerical Integration

We consider the problem of numerical integration in d dimensions. For d = 1

classical integration methods, like the trapezoidal rule, are weighted averages of

INTRODUCTION 315

the value of the function at equally spaced points;

Z m

X

1

n

f (u)du ā (6.4)

wn f ( ),

m

0 n=0

where w0 = wm = 1/(2m), and wn = 1/m for 1 Ā· n Ā· m ā’ 1. The trape-

zoidal rule is exact for any function that is linear (or piecewise linear between

grid-points) and so we can assess the error of integration by using a linear ap-

proximation through the points ( m , f ( m )) and ( j+1 , f ( j+1 )). Assume

j j

m m

j j+1

<x< .

m m

If the function has a continuous second derivative, we have by Taylorā™s Theorem

that the diļ¬erence between the function and its linear interpolant is of order

j2

O(x ā’ i.e.

m) ,

j j j+1 j j

) ā’ f ( )] + O(x ā’ )2 .

) + (x ā’ )m[f (

f (x) = f (

m m m m m

j j+1

Integrating both sides between and notice that

m,

m

Z

f ( j+1 ) + f ( m )

j

(j+1)/m

j j j+1 j m

{f ( ) + (x ā’ )m[f ( ) ā’ f ( )]}dx =

m m m m 2m

j/m

is the area of the trapezoid and the error in the approximation is

Z (j+1)/m

j2

) ) = O(mā’3 ).

(x ā’

O(

m

j/m

Adding these errors of approximation over the m trapezoids gives O(mā’2 ). Con-

sequently, the error in the trapezoidal rule approximation is O(mā’2 ), provided

that f has a continuous second derivative on [0, 1].

We now consider the multidimensional case, d ā„ 2. Suppose we evaluate

the function at all of the (m + 1)d points of the form ( n1 , . . . , ns ) and use this

m m

to approximate the integral. The classical numerical integration methods use a

Cartesian product of one-dimensional integration rules. For example, the d-fold

Cartesian product of the trapezoidal rule is

316 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Z m m

X X n1 ns

f (u)du ā Ā·Ā·Ā· wn1 Ā· Ā· Ā· wns f ( (6.5)

, . . . , ),

m m

[0,1]d n1 =0 ns =0

where [0, 1]d is the closed s-dimensional unit cube and the wn are as before.

The total number of nodes is N = (m + 1)s . From the previous error bound it

follows that the error is O(mā’2 ), provided that the second partial derivatives of

f are continuous on [0, 1]d . We know that the error cannot be smaller because

when the function depends on only one variable and is constant in the others,

the one-dimensional result is a special case. In terms of the number N of nodes

or function evaluations, since m = O(N 1/d ), the error is O(N ā’2/d ), which,

with increasing dimension d, changes dramatically. For example if we required

N = 100 nodes to achieve a required precision in the case d = 1, to achieve the

same precision for a d = 5 dimensional integral using this approach we would

need to evaluate the function at a total of 100d = 1010 = ten billion nodes. As

the dimension increases, the number of function evaluations or computation

required for a ļ¬xed precision increases exponentially. This phenomena is often

called the ācurse of dimensionalityā, exorcised in part at least by quasi or regular

Monte Carlo methods.

The ordinary Monte Carlo method based on simple random sampling is free

of the curse of dimensionality. By the central limit theorem, even a crude Monte

Carlo estimate for numerical integration yields a probabilistic error bound of the

form OP (N ā’1/2 ) in terms of the number N of nodes (or function evaluations)

and this holds under a very weak regularity condition on the function f . The

remarkable feature here is that this order of magnitude does not depend on the

dimension d. This is true even if the integration domain is complicated. Note

however that the deļ¬nition of ā Oā has changed from one that essentially con-

siders the worst case scenario to OP which measures the average or probabilistic

behaviour of the error.

Some of the oft-cited deļ¬ciencies of the Monte Carlo method limiting its

THEORY OF LOW DISCREPANCY SEQUENCES 317

usefulness are:

1. There are only probabilistic error bounds (there is no guarantee that the

expected accuracy is achieved in a particular case -an alternative approach

would optimize the āworst-caseā behaviour);

2. Regularity of the integrand is not exploited even when it is available. The

probabilistic error bound OP (N ā’1/2 ) holds under a very weak regularity

condition but no extra beneļ¬t is derived from any additional regularity or

smoothness of the integrand. For example the estimator is no more precise

if we know that the function f has several continuous derivatives. In cases

when we do not know whether the integrand is smooth or diļ¬erentiable,

it may be preferable to use Monte Carlo since it performs reasonably well

without this assumption.

3. Genuine Monte Carlo is not feasible anyway since generating truly in-

dependent random numbers is virtually impossible. In practice we use

pseudo-random numbers to approximate independence.

Theory of Low discrepancy sequences

The quasi-Monte Carlo method places attention on the objective, approximating

an integral, rather than attempting to imitate the behaviour of independent

uniform random variates. Quasi-random sequences of low discrepancy sequences

would fail all of the tests applied to a pseudo-random number generate except

those testing for uniformity of the marginal distribution because the sequence

is, by construction, autocorrelated. Our objective is to approximate an integral

using a average of the function at N points, and we may adjust the points so that

the approximation is more accurate. Ideally we would prefer these sequences to

be self-avoiding, so that as the sequence is generated, holes are ļ¬lled. As usual

318 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

we will approximate the integral with an average;

Z N

1X

f (u)du ā (6.6)

f (xn ).

N n=1

d

[0,1]

Quasi Monte-Carlo is able to achieve a deterministic error bound O((logN )d /N )

for suitably chosen sets of nodes and for integrands with a relatively low degree

of regularity, much better than the rate O(N ā’1/2 ) achieved by Monte Carlo

methods. Even smaller error bounds can be achieved for suļ¬ciently regular

integrands. There are several algorithms or quasi-Monte-Carlo sequences which

give rise to this level of accuracy.

Suppose, as with a crude Monte Carlo estimate, we approximate the integral

with (6.6) with x1 , . . . , xN ā [0, 1]d . The sequence x1 , . . . , xN ,... is determinis-

tic (as indeed are the pseudo-random sequences we used for Crude Monte-Carlo),

but they are now chosen so as to guarantee a small error. Points are chosen so

as to achieve the maximal degree of uniformity or a low degree of discrepancy

with a uniform distribution. A ļ¬rst requirement for a low discrepancy sequence

is that we obtain convergence of the sequence of averages so that:

Z

N

1X

lim f (xn ) = f (u)du,

Nā’ā N [0,1]d

n=1

and this should hold for a reasonably large class of integrands. This suggests

that the most desirable sequences of nodes x1 , . . . , xN are āevenly distributedā

over [0, 1]d . Various notions of discrepancy have been considered as quantitative

measures for the deviation from the uniform distribution but we will introduce

only one here, the so-called āstar-discrepancyā. The star discrepancy is perhaps

the more natural one in statistics, since it measures the maximum diļ¬erence be-

tween the empirical cumulative distribution function of the points {x1 , . . . , xN }

and the uniform distribution of measure on the unit cube. Suppose we construct

N

X

bN (x) = 1 I(xn Ā· x),

F

N n=1

THEORY OF LOW DISCREPANCY SEQUENCES 319

the empirical cumulative distribution function of the points x1 , . . . , xN , and

compare it with

F (x) =F (x1 , ...xd ) = min(1, x1 x2 ...xd ) if all xi ā„ 0

the theoretical uniform distribution on [0, 1]d . While any measure of the dif-

ference could be used, the star discrepancy is simply the Kolmogorov-Smirnov

distance between these two cumulative distribution functions

# of points in B

b

ā—

DN = sup | FN (x) ā’ F (x)| = sup | ā’ Ī»(B)|,

N

x B

where the supremum is taken over all rectangles B of the form [0, x1 ] Ć— [0, x2 ] Ć—

... Ć— [0, xd ] and where Ī»(B) denotes the Lebesgue measure of B in Rd .

It makes intuitive sense that we should choose points {x1 , . . . , xN } such

that the discrepancy is small for each N. This intuition is supported by a

large number of theoretical results, at least in the case of smooth integrands

with smooth partial derivatives. The smoothness is measured using V (f ), a

ātotal variationā in the sense of Hardy and Krause, intuitively the length of the

monotone segments of f. For a one dimensional function with a continuous ļ¬rst

derivative it is simply

Z 1

|f 0 (x)|dx.

V (f ) =

0

In higher dimensions, the Hardy Krause variation may be deļ¬ned in terms of

the integral of partial derivatives;

Deļ¬nition 48 Hardy and Krause Total Variation

If f is suļ¬ciently diļ¬erentiable then the variation of f on [0, 1]d in the sense

of Hardy and Krause is

s

X X

V (k) (f ; i1 , . . . , ik ), (6.7)

V (f ) =

k=1 1 i1 <Ā·Ā·Ā·<ik s

where

ĀÆ ĀÆ

Z Z

ĀÆ ĀÆ

1 1

ā‚sf

ĀÆ ĀÆ

(k)

Ā·Ā·Ā· dxi1 Ā· Ā· Ā· dxik . (6.8)

V (f ; i1 , . . . , ik ) = ĀÆ ā‚xi Ā· Ā· Ā· ā‚xi ĀÆ

0 0 k xj =1,j6=i1 ,...,ik

1

320 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

The precision in our approximation to an integral as an average of function

values is closely related to the discrepancy measure as the following result shows.

Indeed the mean of the function values diļ¬ers from the integral of the function

by an error which is bounded by the product of the discrepancy of the sequence

and the measure V (f ) of smoothness of the function.

Theorem 49 (Koksma-Hlawka inequality)

If f has bounded variation V (f ) on [0, 1]d in the sense of Hardy and Krause,

then, for any x1 , . . . , xN ā [0, 1]d , we have

Z

N

1X ā—

| f (xn ) ā’ f (u)du| Ā· V (f )DN . (6.9)

N n=1 s

I

We do not normally use this inequality as it stands since the evaluation of the

error bound on the right hand side requires determining V (f ), typically a very

diļ¬cult task. However this bound allows a separation between the regularity

properties of the integrand and the degree of uniformity of the sequence. We can

guarantee a reasonable approximation for any function f with bounded total

ā—

variation V (f ) by ensuring that the discrepancy of the sequence DN is small.

For this reason, the discrepancy is central to quasi-Monte Carlo integration.

Sequences with small star discrepancy are called low-discrepancy sequences. In

fact since a variety of sequences exist with discrepancy of order

(log N )d

N

as N ā’ ā, the term ālow-discrepancyā is often reserved for these.

Examples of low discrepancy sequences

Van der Corput Sequence.

In the one dimensional case the best rate of convergence is O(N ā’1 log N ), N ā„ 2.

It is achieved, for example, by the van der Corput sequence, obtained by

EXAMPLES OF LOW DISCREPANCY SEQUENCES 321

reversing the digits in the representation of some sequence of integers in a given

base. Consider one-dimensional case d = 1 and base b = 2. Take the base b

representation of the sequence of natural numbers;

1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011, 1100, 1101, ...

P

and then map these into the unit interval [0, 1] so that the integer t ak bk

k=0

Pt

is mapped into the point k=0 ak bā’kā’1 . These binary digits are mapped into

(0,1) in the following three steps;

1. Write n using its binary expansion. e.g. 13 = 1(8) + 1(4) + 0(2) + 1(1)

becomes 1101.

2. Reverse the order of the digits. e.g. 1101 becomes 1011.

3. Determine the number that this is the binary decimal expansion for. e.g.

1011 = 1( 1 ) + 0( 1 ) + 1( 1 ) + 1( 16 ) =

1 11

16 .

2 4 8

Thus 1 generates 1/2, 10 generates 0( 1 ) + 1( 1 ), 11 generates 1( 1 ) + 1( 1 )

2 4 2 4

and the sequence of positive integers generates the points. The intervals are

recursively split in half in the sequence 1/2, 1/4, 3/4, 1/8, 5/8, 3/8, 7/8, ... and

the points are fairly evenly spaced for any value for the number of nodes N ,

and perfectly spaced if N is of the form 2k ā’ 1. The star discrepancy of this

sequence is

log N

ā—

DN = O( )

N

which matches the best that is attained for inļ¬nite sequences.

The Halton Sequence

This is simply the multivariate extension of the Van der Corput sequence. In

higher dimensions, say in d dimensions, we choose d distinct primes, b1 , b2 , ...bd

(usually the smallest primes) and generate, from the same integer m , the d

components of the vector using the method described for the Van der Corput

322 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

sequence. For example, we consider the case d = 3 and use bases b1 = 2,

b2 = 3, b3 = 5 because these are the smallest three prime numbers. The ļ¬rst

few vectors , ( 1 , 1 , 1 ), ( 1 , 2 , 2 ), ( 3 , 1 , 3 ), ...are generated in the table below.

235 435 495

repres ļ¬rst repres. second repres third

m

base 2 component base 3 comp base 5 comp

1 1 1/2 1 1/3 1 1/5

2 10 1/4 2 2/3 2 2/5

3 11 3/4 10 1/9 3 3/5

4 100 1/8 11 4/9 4 4/5

5 101 5/8 12 7/9 10 1/25

6 110 3/8 20 2/9 11 6/25

7 111 7/8 21 5/9 12 11/25

9 1000 1/16 22 8/9 13 16/25

10 1001 9/16 100 1/27 14 21/25

Figure 6.1 provides a plot of the ļ¬rst 500 points in the above Halton sequence

of dimension 3.

There appears to be greater uniformity than a sequence of random points

would have. Some patterns are discernible on the two dimensional plot of the

ļ¬rst 100 points, for example see Figures 6.2 and 6.3.

These ļ¬gures can be compared with the plot of 100 pairs of independent

uniform random numbers in Figure 6.4, which seems to show more clustering

and more holes in the point cloud.

These points were generated with the following function for producing the

Halton sequence.

function x=halton(n,s)

%x has dimension n by s and is the ļ¬rst n terms of the halton sequence of

%dimension s.

p=primes(s*6); p=p(1:s); x=[];

EXAMPLES OF LOW DISCREPANCY SEQUENCES 323

Figure 6.1: 500 Points from a Halton sequence of dimension 3

324 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Figure 6.2: The ļ¬rst and second coordinate of 100 points from the Halton

sequence of dimension 3

EXAMPLES OF LOW DISCREPANCY SEQUENCES 325

Figure 6.3: The second and third coordinate of 100 points from the Halton

sequence of dimension 3

326 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Figure 6.4: 100 independent U [0, 1] pairs

EXAMPLES OF LOW DISCREPANCY SEQUENCES 327

for i=1:s

x=[x (corput(n,p(i)))ā™];

end

function x=corput(n,b)

% converts integers 1:n to from van der corput number with base b

m=ļ¬‚oor(log(n)/log(b));

n=1:n; A=[];

for i=0:m

a=rem(n,b); n=(n-a)/b;

A=[A ;a];

end

x=((1./bā™).^(1:(m+1)))*A;

The Halton sequence is a genuine low discrepancy sequence in the sense that

(log N )d

ā—

DN = O( )

N

and the coverage of the unit cube is reasonably uniform for small dimensions.

Unfortunately the notation O() hides a constant multiple, one which, in this

case, depends on the dimension d. Roughly (Niedereiter, 1992), this constant

is asymptotic to dd which grows extremely fast in d. This is one indicator that

for large d, the uniformity of the points degrades rapidly, largely because the

relative sparseness of the primes means that the d0 th prime is very large for d

large. This results in larger holes or gaps in that component of the vector than

we would like. This is evident for example in Figure6.5 where we plot the last

two coordinates of the Halton sequence of dimension 15.

The performance of the Halton sequence is considerably enhanced by per-

muting the coeļ¬cients ak prior to mapping into the unit interval as is done by

the Faure sequence.

328 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Figure 6.5: The 14ā™th and 15ā™th coordinates of the ļ¬rst 1000 of a Halton sequence

d = 15

EXAMPLES OF LOW DISCREPANCY SEQUENCES 329

Faure Sequence

The Faure sequence is similar to the Halton sequence in that each dimension is

a permutation of a van der Corput sequence; however, the same prime is used

as the base b for each of the components of the vector, and is usually chosen to

be the smallest prime greater than or equal to the dimension (Fox, 1996).

In the Van der Corput sequence we wrote the natural numbers in the form

Pt Pt

ak bk which was then mapped into the point k=0 ak bā’kā’1 in the unit

k=0

interval. For the Faure sequence we use the same construction but we use

diļ¬erent permutations of the coeļ¬cients ak for each of the coordinates. In

particular in order to generate the iā™th coordinate we generate the point

t

X

ck bā’kā’1

k=0

where

X ĀµmĀ¶

t

(i ā’ 1)mā’k am mod b

ck =

k

m=k

Notice that only the last t ā’ k + 1 values of ai are used to generate ck . For

example consider the case d = 2, b = 2. Then the ļ¬rst 10 Faure numbers are

0 1/2 1/4 3/4 1/8 5/8 3/8 7/8 1/16 9/16

0 1/2 3/4 1/4 5/8 1/8 3/8 7/8 15/16 7/16

The ļ¬rst row corresponds to the Van der Corput numbers and the second row

of obtained from the ļ¬rst by permuting the values with the same denominator.

The Faure sequence has better regularity properties than does the Halton

sequence above particularly in high dimensions. However the diļ¬erences are by

no means evident from a graph when the dimension is moderate. For example

we plot in Figure 6.6 the 14ā™th and 15ā™th coordinates of 1000 points from the

Faure sequence of dimension d = 15 for comparison with Figure 6.5.

Other suggestions for permuting the digits in a Halton sequence include

using only every l0 th term in the sequence so as to destroy the cycle.

330 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Figure 6.6: The last two coordinates of the ļ¬rst 1000 Faure points of dimension

d = 15.

EXAMPLES OF LOW DISCREPANCY SEQUENCES 331

In practice, in order to determine the eļ¬ect of using one of these low dis-

crepancy sequences we need only substitute such a sequence for the vector of

independent uniform random numbers used by a simulation. For example if we

wished to simulate a process for 10 time periods, then value a call option and

average the results, we could replace the 10 independent uniform random num-

bers that we used to generate one path by an element of the Halton sequence

with d = 10.

Suppose we return brieļ¬‚y to the call option example treated in Chapter 3.

The true value of this call option was around 0.4615 according to the Black-

Scholes formula. If however we substitute the Van der Corput sequence for the

sequence of uniform random numbers,

mean(fn(corput(100000,2)))

we obtain an estimate of 0.4614 very close to the correct value. I cannot

compare these estimators using the notion of eļ¬ciency that we used there, how-

ever, because these low-discrepancy sequences are not random and do not even

attempt to emulate random numbers. Though unable to compare performance

with the variance of an estimator, we can look at the Mean squared error (see

for example Figure 6.8). which shows a faster rate of convergence for Quasi

Monte Carlo equivalent to variance reduction in excess of 100). Galanti & Jung

(1997), report that the Faure sequence suļ¬ers from the problem of start-up

and especially in high-dimensions and the Faure numbers can exhibit clustering

about zero. In order to reduce this problem, Faure suggests discarding the ļ¬rst

b4 ā’ 1 points.

Sobol Sequence

The Sobol sequence is generated using a set of so-called direction numbers

mi

= 1, 2, where the mi are odd positive integers less than 2i . The

vi = 2i , i

values of mi are chosen to satisfy a recurrence relation using the coeļ¬cients of

332 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

a primitive polynomial in the Galois Field of order 2. A primitive polynomial is

irreducible (i.e. cannot be factored into polynomials of smaller degree) and does

not divide the polynomial xr + 1 for r < 2p ā’ 1. For example the polynomial

x2 + x + 1 has no non-trival factors over the Galois Field of order 2 and it does

divide x3 + 1 but not xr + 1 for r < 3. Corresponding to a primitive polynomial

z p + c1 z pā’1 + ...cpā’1 z + cp

is the recursion

mi = 2c1 miā’1 + 22 c2 miā’2 + ... + 2p cp miā’p

where the addition is carried out using binary arithmetic. For the Sobol se-

quence, we then replace the binary digit ak by ak vk .

In the case d = 2, the ļ¬rst 10 Sobol numbers are, using irreducible polyno-

mials x + 1 and x3 + x + 1

0 1/2 1/4 3/4 3/8 7/8 1/8 5/8 5/16 13/16

0 1/2 1/4 3/4 1/8 5/8 3/8 7/8 11/16 3/16

Again we plot the last two coordinates for the ļ¬rst 1000 points from a Sobol

sequence of dimension d = 15 in Figure 6.7 for comparison with Figures 6.5 and

6.6.

Although there is a great deal of literature espousing the use of one quasi-

Monte Carlo sequence over another, most results from a particular application

and there is not strong evidence at least that when the dimension of the problem

is moderate (for example d Ā· 15) it makes a great deal of diļ¬erence whether we

use Halton, Faure or Sobol sequences. There is evidence that the starting values

for the Sobol sequences have an eļ¬ect on the speed of convergence, and that

Sobol sequences can be generated more quickly than Faure Moreover neither

the Faure nor Sobol sequence provides a āblack-boxā method because both are

EXAMPLES OF LOW DISCREPANCY SEQUENCES 333

Figure 6.7: The last two coordinates of the ļ¬rst 1000 points from a Sobel se-

quence of dimension 15

334 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

sensitive to intitialization. I will not attempt to adjudicate the considerable

literature on this topic here, but provide only a fragment of evidence that, at

least in the kind of example discussed in the variance reduction chapter, there

is little to choose between the various methods. Of course this integral, the

discounted payoļ¬ from a call option as a function of the uniform input, is a

one-dimensional integral so the Faure, Halton and Van der Corput sequences

are all the same thing in this case. In Figure 6.8 we plot the (expected) squared

error as a function of sample size for n = 1, ..., 100000 for crude Monte Carlo

( the dashed line) and the Van der Corput sequence. The latter, although it

oscillates somewhat, is substantially better at all sample sizes, and its mean

squared error is equivalent to a variance reduction of around 1000 by the time

we reach n = 100, 000. The diļ¬erent slope indicates an error approaching zero

at rate close to nā’1 rather than the rate nā’1/2 for the Crude Monte Carlo

estimator. The Sobol sequence, although highly more variable as a function

of sample size, appears to show even more rapid convergence along certain

subsequences.

The Sobol and Faure sequences are particular cases of (t, s) ā’ nets. In order

to deļ¬ne then we need the concept of an elementary interval.

Elementary Intervals and Nets

Deļ¬nition: elementary interval

An elementary interval in base b is n interval E in I s of the form

Ā¶

s

Y aj (aj + 1)

(6.10)

E= , ,

bdj bdj

j=1

with dj ā„ 0, 0 Ā· aj Ā· bdj and aj , dj are integers.

ELEMENTARY INTERVALS AND NETS 335

Figure 6.8: (Expected) squared error vs. sample size in the estimation of an

Call option price for Crude MC and Van der Corput sequence.

336 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

Deļ¬nition: (t, m, s) - net

Let 0 Ā· t Ā· m be integers. A (t.m.s) - net in base b is a ļ¬nite sequence with

bm points from I s such that every elementary interval in base b of volume btā’m

contains exactly bt points of the sequence.

Deļ¬nition: (t, s) - sequence

An inļ¬nite sequence of points {xi } ā I s is a (t,s)-sequence in base b if for all

k ā„ 0 and m > t, the ļ¬nite sequence xkbm , . . . , x(k+1)bmā’1 forms a (t,m,s) - net

in base b.

It is known that for a (t, s)-sequence in base b the low discrepancy is ensured:

(log N )s (log N )sā’1

ā—

DN Ā· C (6.11)

+ O( ).

N N

Special constructions of such sequences for s ā„ 2 have the smallest discrep-

ancy that is currently known (H. Niederreiter, 1992, Random Number Genera-

tion and Quasi-Monte Carlo Methods). j

The thesis of K.S. Tan (1998) provides a thorough investigation into various

improvements in Quasi-Monte Carlo sampling, as well as the evidence of the high

eļ¬ciency of these methods when valuing Rainbow Options in high dimensions.

Papageorgiou and Traub (1996) tested what Tezuka called generalized Faure

points. They concluded that these points were superior to Sobol points for the

model problem. Particularly important for ļ¬nancial computation, a reasonably

small error could be achieved with few evaluations. For example, just 170

generalized Faure points were suļ¬cient to achieve an error of less than one part

in a hundred for a 360 dimensional problem. See also Traub and Wozniakowski

(1994) and Paskov and Traub (1995).

In summary, Quasi-Monte Carlo frequently generates estimates superior to

Monte-Carlo methods in many problems of low or intermediate eļ¬ective dimen-

sion. If the dimension d is large, but a small number of variables determine

PROBLEMS 337

most of the variability in the simulation, then we might expect Quasi Monte-

Carlo methods to continue to perform well. The price we pay for the smaller

error often associated with quasi Monte-Carlo methods and other numerical

techniques, one that often a consequence of any departure from a crude sim-

ulation of the process, is a feel for the distribution of various functionals of

the process of interest, as opposed to generating a single precise estimate. The

theory supporting low-discrepancy sequences, both the measures of discrepancy

themselves and the variation measure V (f ) are both tied, somewhat artiļ¬cially,

to the axes. For example if f (x) represents the indicator function of a square

with sides parallel to the axes in dimension d = 2, then V (f ) = 0. However, if

we rotate this rectangle by 45 degrees, the variation becomes inļ¬nite indicat-

ing that functions with steep isoclines at a 45 degree angle to the exes may be

particularly diļ¬cult to integrate using Quasi Monte Carlo.

Problems

1. Use 3-dimensional Halton sequences to integrate the function

Z 1Z 1Z 1

f (x, y, z)dxdydz

0 0 0

where f (x, y, z) = 1 if x < y < z and otherwise f (x, y, z) = 0. Compare

your answer with the true value of the integral and with crude Monte

Carlo integral of the same function.

2. Use your program from question 1 to generate 50 points uniformly distrib-

uted in the unit cube. Evaluate the Chi-squared statistic Ļ2 for a test

obs

that these points are independent uniform on the cube where we divide the

cube into 8 subcubes, eacc having sides of length 1/2. Carry out the test

by ļ¬nding P [Ļ2 > Ļ2 ] where Ļ2 is a random chi-squared variate with

obs

the appropriate number of degrees of freedom. This quantity P [Ļ2 > Ļ2 ]

obs

is usually referrred to as the āsigniļ¬cance probabilityā or āp-valueā for

338 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

the test. If we suspected too much uniformity to be consistent with as-

sumption of independent uniform, we might use the other tail of the test,

i.e. evaluate P [Ļ2 < Ļ2 ]. Do so and comment on your results.

obs