Chapter 6


Quasi- Monte Carlo
Multiple Integration

Introduction

In some sense, this chapter ¬ts within Chapter 4 on variance reduction; in
some sense it is strati¬cation run wild. Quasi-Monte Carlo methods are purely
deterministic, numerical analytic methods in the sense that they do not even
attempt to emulate the behaviour of independent uniform random variables, but
rather cover the space in d dimensions with fewer gaps than independent random
variables would normally admit. Although these methods are particularly when
evaluating integrals in moderate dimensions, we return brie¬‚y to the problem
of evaluating a one-dimension integral of the form

Z 1
f (x)dx.
0


The simplest numerical approximation to this integral consists of choosing a
point xj in the interval [ N , j+1 ], j = 0, 1, ..., N ’ 1, perhaps the midpoint of the
j
N


311
312 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

interval, and then evaluating the average
N ’1
1X
(6.1)
f (xj ).
N j=0

If the function f has one continuous derivative, such a numerical method with
N equally or approximately equally spaced points will have bias that approaches
0 at the rate 1/N because, putting M = sup{|f 0 (z)|; 0 < z < 1},
Z (j+1)/N
1 1
f (x)dx ’ f (xj ) · 2 M (6.2)
N N
j/N

and so summing both sides over j gives
Z N’1
1X
1
1
| f (x)dx ’ f (xj )| · M.
N j=0 N
0


We will refer to the error in the numerical integral in this case
Z N ’1
1X
1
µN = | f (x)dx ’ f (xj )|
N j=0
0


as O(N ’1 ) which means that the sequence of errors µN satis¬es

lim sup N ’1 µN < ∞
N ’∞

or intuitively that the errors are bounded by a constant times N ’1 .
If the function f is known to have bounded derivatives of second or third
order, then integrals can be approximated to an even higher degree of precision.
For example various numerical quadrature formulae permit approximating an
R1
integral of the form 0 f (x)w(x)dx with a weighted average of N points
N
X
(6.3)
wj f (xj )
j=1

in such a way that if f (x) is a polynomial of degree 2N ’ 1 or less, the approx-
imation is exact. Here the function w(x) is typically some density such as the
uniform, exponential or normal density and the optimal placement of the points
xj as well as the weights wj depends on w(x). Of course a smooth function
INTRODUCTION 313

can be closely approximated with a polynomial of high degree and so numerical
quadrature formulae of the form (6.3) permit approximating a one-dimension
integral arbitrarily closely provided that the function is su¬ciently smooth, i.e.
it has bounded derivatives of su¬ciently high order. We should note that in
this case, the weights wj and the points xj are both deterministic. By contrast,
the Monte Carlo integral
N
1X
b
θMC = f (Ui )
N i=1
with N points places these points at random or pseudo-random locations, has
q
b
zero bias but the standard deviation of the estimator var(θMC ) is a constant

multiple of 1/ N . The Central Limit theorem assures us that
Z1
b
N 1/2 (θMC ’ f (x)dx)
0

converges to a normal distribution which means that the error is order (in prob-
ability) N ’1/2 . Note that there is a change in our measure of the size of an
error, since only the variance or standard deviation of a given term in the se-
quence of errors is bounded, not the whole sequence of errors µN . In particular
b
if a pseudo-random estimator θ satis¬es
Z1
b f (x)dx)2 = O(N ’2k )
E(θ ’
0

then we say that the error is OP (N ’k ) where OP denotes “order in probability”.
This is clearly a weaker notion than O(N ’k ). Even the simplest numerical
integral (6.1) has a faster rate of convergence then that of the Monte Carlo
integral with or without use of the variance reduction techniques of Chapter 4.
This is a large part of the reason numerical integration is usually preferred to
Monte Carlo methods in one dimension, at least for smooth functions, but it
also indicates that for regular integrands, there is room for improvement over
Monte Carlo in higher dimensions as well.
The situation changes in 2 dimensions. Suppose we wish to distribute N
points over a uniform lattice in some region such as the unit square. One
314 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

possible placement is to points of the form

i j
( √ , √ ), i, j = 1, 2, ... N
N N

assuming for convenience of notation that N is integer. The distance between

adjacent points is of order 1/ N and by an argument akin to (6.2), the bias in

a numerical integral is order 1/ N . This is the now same order as the standard
deviation of a Monte Carlo integral, indicating that the latter is already, in two
dimensions, competitive. When the dimension s ≥ 3, a similar calculation shows
that the standard deviation of the Monte-Carlo method is strictly smaller order
than the error of a numerical integral with weights at lattice points. Essentially,
the placement of points on a lattice for evaluating a d’dimensional integral is far
from optimal when d ≥ 2. Indeed various deterministic alternatives called quasi-
random samples provide substantially better estimators especially for smooth
functions of several variables. Quasi-random samples are analogous to equally
spaced points in one dimension and are discussed at length by Niederreiter
(1978), where it is shown that for su¬ciently smooth functions, one can achieve
rates of convergence close to the rate 1/N for the one-dimensional case.
We have seen a number of methods designed to reduce the dimensionality
of the problem. Perhaps the most important of these is conditioning, which can
reduce an d’dimensional integral to a one-dimensional one. In the multidimen-
sional case, variance reduction has an increased importance because of the high
variability induced by the dimensionality of crude methods. The other vari-
ance reduction techniques such as regression and strati¬cation carry over to the
multivariable problem with little change, except for the increased complexity of
determining a reasonable strati¬cation in such problems.


Errors in numerical Integration

We consider the problem of numerical integration in d dimensions. For d = 1
classical integration methods, like the trapezoidal rule, are weighted averages of
INTRODUCTION 315

the value of the function at equally spaced points;

Z m
X
1
n
f (u)du ≈ (6.4)
wn f ( ),
m
0 n=0

where w0 = wm = 1/(2m), and wn = 1/m for 1 · n · m ’ 1. The trape-
zoidal rule is exact for any function that is linear (or piecewise linear between
grid-points) and so we can assess the error of integration by using a linear ap-
proximation through the points ( m , f ( m )) and ( j+1 , f ( j+1 )). Assume
j j
m m

j j+1
<x< .
m m

If the function has a continuous second derivative, we have by Taylor™s Theorem
that the di¬erence between the function and its linear interpolant is of order
j2
O(x ’ i.e.
m) ,

j j j+1 j j
) ’ f ( )] + O(x ’ )2 .
) + (x ’ )m[f (
f (x) = f (
m m m m m
j j+1
Integrating both sides between and notice that
m,
m

Z
f ( j+1 ) + f ( m )
j
(j+1)/m
j j j+1 j m
{f ( ) + (x ’ )m[f ( ) ’ f ( )]}dx =
m m m m 2m
j/m

is the area of the trapezoid and the error in the approximation is
Z (j+1)/m
j2
) ) = O(m’3 ).
(x ’
O(
m
j/m

Adding these errors of approximation over the m trapezoids gives O(m’2 ). Con-
sequently, the error in the trapezoidal rule approximation is O(m’2 ), provided
that f has a continuous second derivative on [0, 1].
We now consider the multidimensional case, d ≥ 2. Suppose we evaluate
the function at all of the (m + 1)d points of the form ( n1 , . . . , ns ) and use this
m m

to approximate the integral. The classical numerical integration methods use a
Cartesian product of one-dimensional integration rules. For example, the d-fold
Cartesian product of the trapezoidal rule is
316 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION


Z m m
X X n1 ns
f (u)du ≈ ··· wn1 · · · wns f ( (6.5)
, . . . , ),
m m
[0,1]d n1 =0 ns =0


where [0, 1]d is the closed s-dimensional unit cube and the wn are as before.
The total number of nodes is N = (m + 1)s . From the previous error bound it
follows that the error is O(m’2 ), provided that the second partial derivatives of
f are continuous on [0, 1]d . We know that the error cannot be smaller because
when the function depends on only one variable and is constant in the others,
the one-dimensional result is a special case. In terms of the number N of nodes
or function evaluations, since m = O(N 1/d ), the error is O(N ’2/d ), which,
with increasing dimension d, changes dramatically. For example if we required
N = 100 nodes to achieve a required precision in the case d = 1, to achieve the
same precision for a d = 5 dimensional integral using this approach we would
need to evaluate the function at a total of 100d = 1010 = ten billion nodes. As
the dimension increases, the number of function evaluations or computation
required for a ¬xed precision increases exponentially. This phenomena is often
called the “curse of dimensionality”, exorcised in part at least by quasi or regular
Monte Carlo methods.

The ordinary Monte Carlo method based on simple random sampling is free
of the curse of dimensionality. By the central limit theorem, even a crude Monte
Carlo estimate for numerical integration yields a probabilistic error bound of the
form OP (N ’1/2 ) in terms of the number N of nodes (or function evaluations)
and this holds under a very weak regularity condition on the function f . The
remarkable feature here is that this order of magnitude does not depend on the
dimension d. This is true even if the integration domain is complicated. Note
however that the de¬nition of “ O” has changed from one that essentially con-
siders the worst case scenario to OP which measures the average or probabilistic
behaviour of the error.

Some of the oft-cited de¬ciencies of the Monte Carlo method limiting its
THEORY OF LOW DISCREPANCY SEQUENCES 317

usefulness are:


1. There are only probabilistic error bounds (there is no guarantee that the
expected accuracy is achieved in a particular case -an alternative approach
would optimize the “worst-case” behaviour);


2. Regularity of the integrand is not exploited even when it is available. The
probabilistic error bound OP (N ’1/2 ) holds under a very weak regularity
condition but no extra bene¬t is derived from any additional regularity or
smoothness of the integrand. For example the estimator is no more precise
if we know that the function f has several continuous derivatives. In cases
when we do not know whether the integrand is smooth or di¬erentiable,
it may be preferable to use Monte Carlo since it performs reasonably well
without this assumption.


3. Genuine Monte Carlo is not feasible anyway since generating truly in-
dependent random numbers is virtually impossible. In practice we use
pseudo-random numbers to approximate independence.



Theory of Low discrepancy sequences

The quasi-Monte Carlo method places attention on the objective, approximating
an integral, rather than attempting to imitate the behaviour of independent
uniform random variates. Quasi-random sequences of low discrepancy sequences
would fail all of the tests applied to a pseudo-random number generate except
those testing for uniformity of the marginal distribution because the sequence
is, by construction, autocorrelated. Our objective is to approximate an integral
using a average of the function at N points, and we may adjust the points so that
the approximation is more accurate. Ideally we would prefer these sequences to
be self-avoiding, so that as the sequence is generated, holes are ¬lled. As usual
318 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

we will approximate the integral with an average;
Z N
1X
f (u)du ≈ (6.6)
f (xn ).
N n=1
d
[0,1]


Quasi Monte-Carlo is able to achieve a deterministic error bound O((logN )d /N )
for suitably chosen sets of nodes and for integrands with a relatively low degree
of regularity, much better than the rate O(N ’1/2 ) achieved by Monte Carlo
methods. Even smaller error bounds can be achieved for su¬ciently regular
integrands. There are several algorithms or quasi-Monte-Carlo sequences which
give rise to this level of accuracy.

Suppose, as with a crude Monte Carlo estimate, we approximate the integral
with (6.6) with x1 , . . . , xN ∈ [0, 1]d . The sequence x1 , . . . , xN ,... is determinis-
tic (as indeed are the pseudo-random sequences we used for Crude Monte-Carlo),
but they are now chosen so as to guarantee a small error. Points are chosen so
as to achieve the maximal degree of uniformity or a low degree of discrepancy
with a uniform distribution. A ¬rst requirement for a low discrepancy sequence
is that we obtain convergence of the sequence of averages so that:
Z
N
1X
lim f (xn ) = f (u)du,
N’∞ N [0,1]d
n=1

and this should hold for a reasonably large class of integrands. This suggests
that the most desirable sequences of nodes x1 , . . . , xN are “evenly distributed”
over [0, 1]d . Various notions of discrepancy have been considered as quantitative
measures for the deviation from the uniform distribution but we will introduce
only one here, the so-called “star-discrepancy”. The star discrepancy is perhaps
the more natural one in statistics, since it measures the maximum di¬erence be-
tween the empirical cumulative distribution function of the points {x1 , . . . , xN }
and the uniform distribution of measure on the unit cube. Suppose we construct

N
X
bN (x) = 1 I(xn · x),
F
N n=1
THEORY OF LOW DISCREPANCY SEQUENCES 319

the empirical cumulative distribution function of the points x1 , . . . , xN , and
compare it with

F (x) =F (x1 , ...xd ) = min(1, x1 x2 ...xd ) if all xi ≥ 0

the theoretical uniform distribution on [0, 1]d . While any measure of the dif-
ference could be used, the star discrepancy is simply the Kolmogorov-Smirnov
distance between these two cumulative distribution functions
# of points in B
b

DN = sup | FN (x) ’ F (x)| = sup | ’ »(B)|,
N
x B

where the supremum is taken over all rectangles B of the form [0, x1 ] — [0, x2 ] —
... — [0, xd ] and where »(B) denotes the Lebesgue measure of B in Rd .
It makes intuitive sense that we should choose points {x1 , . . . , xN } such
that the discrepancy is small for each N. This intuition is supported by a
large number of theoretical results, at least in the case of smooth integrands
with smooth partial derivatives. The smoothness is measured using V (f ), a
“total variation” in the sense of Hardy and Krause, intuitively the length of the
monotone segments of f. For a one dimensional function with a continuous ¬rst
derivative it is simply
Z 1
|f 0 (x)|dx.
V (f ) =
0

In higher dimensions, the Hardy Krause variation may be de¬ned in terms of
the integral of partial derivatives;

De¬nition 48 Hardy and Krause Total Variation
If f is su¬ciently di¬erentiable then the variation of f on [0, 1]d in the sense
of Hardy and Krause is
s
X X
V (k) (f ; i1 , . . . , ik ), (6.7)
V (f ) =
k=1 1 i1 <···<ik s

where
¯ ¯
Z Z
¯ ¯
1 1
‚sf
¯ ¯
(k)
··· dxi1 · · · dxik . (6.8)
V (f ; i1 , . . . , ik ) = ¯ ‚xi · · · ‚xi ¯
0 0 k xj =1,j6=i1 ,...,ik
1
320 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

The precision in our approximation to an integral as an average of function
values is closely related to the discrepancy measure as the following result shows.
Indeed the mean of the function values di¬ers from the integral of the function
by an error which is bounded by the product of the discrepancy of the sequence
and the measure V (f ) of smoothness of the function.

Theorem 49 (Koksma-Hlawka inequality)
If f has bounded variation V (f ) on [0, 1]d in the sense of Hardy and Krause,
then, for any x1 , . . . , xN ∈ [0, 1]d , we have
Z
N
1X —
| f (xn ) ’ f (u)du| · V (f )DN . (6.9)
N n=1 s
I


We do not normally use this inequality as it stands since the evaluation of the
error bound on the right hand side requires determining V (f ), typically a very
di¬cult task. However this bound allows a separation between the regularity
properties of the integrand and the degree of uniformity of the sequence. We can
guarantee a reasonable approximation for any function f with bounded total

variation V (f ) by ensuring that the discrepancy of the sequence DN is small.
For this reason, the discrepancy is central to quasi-Monte Carlo integration.
Sequences with small star discrepancy are called low-discrepancy sequences. In
fact since a variety of sequences exist with discrepancy of order

(log N )d
N

as N ’ ∞, the term “low-discrepancy” is often reserved for these.



Examples of low discrepancy sequences

Van der Corput Sequence.

In the one dimensional case the best rate of convergence is O(N ’1 log N ), N ≥ 2.
It is achieved, for example, by the van der Corput sequence, obtained by
EXAMPLES OF LOW DISCREPANCY SEQUENCES 321

reversing the digits in the representation of some sequence of integers in a given
base. Consider one-dimensional case d = 1 and base b = 2. Take the base b
representation of the sequence of natural numbers;

1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011, 1100, 1101, ...

P
and then map these into the unit interval [0, 1] so that the integer t ak bk
k=0
Pt
is mapped into the point k=0 ak b’k’1 . These binary digits are mapped into
(0,1) in the following three steps;

1. Write n using its binary expansion. e.g. 13 = 1(8) + 1(4) + 0(2) + 1(1)
becomes 1101.

2. Reverse the order of the digits. e.g. 1101 becomes 1011.

3. Determine the number that this is the binary decimal expansion for. e.g.
1011 = 1( 1 ) + 0( 1 ) + 1( 1 ) + 1( 16 ) =
1 11
16 .
2 4 8


Thus 1 generates 1/2, 10 generates 0( 1 ) + 1( 1 ), 11 generates 1( 1 ) + 1( 1 )
2 4 2 4

and the sequence of positive integers generates the points. The intervals are
recursively split in half in the sequence 1/2, 1/4, 3/4, 1/8, 5/8, 3/8, 7/8, ... and
the points are fairly evenly spaced for any value for the number of nodes N ,
and perfectly spaced if N is of the form 2k ’ 1. The star discrepancy of this
sequence is
log N

DN = O( )
N
which matches the best that is attained for in¬nite sequences.


The Halton Sequence

This is simply the multivariate extension of the Van der Corput sequence. In
higher dimensions, say in d dimensions, we choose d distinct primes, b1 , b2 , ...bd
(usually the smallest primes) and generate, from the same integer m , the d
components of the vector using the method described for the Van der Corput
322 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

sequence. For example, we consider the case d = 3 and use bases b1 = 2,
b2 = 3, b3 = 5 because these are the smallest three prime numbers. The ¬rst
few vectors , ( 1 , 1 , 1 ), ( 1 , 2 , 2 ), ( 3 , 1 , 3 ), ...are generated in the table below.
235 435 495


repres ¬rst repres. second repres third
m
base 2 component base 3 comp base 5 comp
1 1 1/2 1 1/3 1 1/5
2 10 1/4 2 2/3 2 2/5
3 11 3/4 10 1/9 3 3/5
4 100 1/8 11 4/9 4 4/5
5 101 5/8 12 7/9 10 1/25
6 110 3/8 20 2/9 11 6/25
7 111 7/8 21 5/9 12 11/25
9 1000 1/16 22 8/9 13 16/25
10 1001 9/16 100 1/27 14 21/25

Figure 6.1 provides a plot of the ¬rst 500 points in the above Halton sequence
of dimension 3.
There appears to be greater uniformity than a sequence of random points
would have. Some patterns are discernible on the two dimensional plot of the
¬rst 100 points, for example see Figures 6.2 and 6.3.
These ¬gures can be compared with the plot of 100 pairs of independent
uniform random numbers in Figure 6.4, which seems to show more clustering
and more holes in the point cloud.
These points were generated with the following function for producing the
Halton sequence.
function x=halton(n,s)

%x has dimension n by s and is the ¬rst n terms of the halton sequence of

%dimension s.

p=primes(s*6); p=p(1:s); x=[];
EXAMPLES OF LOW DISCREPANCY SEQUENCES 323




Figure 6.1: 500 Points from a Halton sequence of dimension 3
324 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION




Figure 6.2: The ¬rst and second coordinate of 100 points from the Halton
sequence of dimension 3
EXAMPLES OF LOW DISCREPANCY SEQUENCES 325




Figure 6.3: The second and third coordinate of 100 points from the Halton
sequence of dimension 3
326 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION




Figure 6.4: 100 independent U [0, 1] pairs
EXAMPLES OF LOW DISCREPANCY SEQUENCES 327

for i=1:s

x=[x (corput(n,p(i)))™];

end

function x=corput(n,b)

% converts integers 1:n to from van der corput number with base b

m=¬‚oor(log(n)/log(b));

n=1:n; A=[];

for i=0:m

a=rem(n,b); n=(n-a)/b;

A=[A ;a];

end

x=((1./b™).^(1:(m+1)))*A;

The Halton sequence is a genuine low discrepancy sequence in the sense that

(log N )d

DN = O( )
N

and the coverage of the unit cube is reasonably uniform for small dimensions.
Unfortunately the notation O() hides a constant multiple, one which, in this
case, depends on the dimension d. Roughly (Niedereiter, 1992), this constant
is asymptotic to dd which grows extremely fast in d. This is one indicator that
for large d, the uniformity of the points degrades rapidly, largely because the
relative sparseness of the primes means that the d0 th prime is very large for d
large. This results in larger holes or gaps in that component of the vector than
we would like. This is evident for example in Figure6.5 where we plot the last
two coordinates of the Halton sequence of dimension 15.

The performance of the Halton sequence is considerably enhanced by per-
muting the coe¬cients ak prior to mapping into the unit interval as is done by
the Faure sequence.
328 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION




Figure 6.5: The 14™th and 15™th coordinates of the ¬rst 1000 of a Halton sequence
d = 15
EXAMPLES OF LOW DISCREPANCY SEQUENCES 329

Faure Sequence

The Faure sequence is similar to the Halton sequence in that each dimension is
a permutation of a van der Corput sequence; however, the same prime is used
as the base b for each of the components of the vector, and is usually chosen to
be the smallest prime greater than or equal to the dimension (Fox, 1996).
In the Van der Corput sequence we wrote the natural numbers in the form
Pt Pt
ak bk which was then mapped into the point k=0 ak b’k’1 in the unit
k=0

interval. For the Faure sequence we use the same construction but we use
di¬erent permutations of the coe¬cients ak for each of the coordinates. In
particular in order to generate the i™th coordinate we generate the point
t
X
ck b’k’1
k=0

where
X µm¶
t
(i ’ 1)m’k am mod b
ck =
k
m=k

Notice that only the last t ’ k + 1 values of ai are used to generate ck . For
example consider the case d = 2, b = 2. Then the ¬rst 10 Faure numbers are

0 1/2 1/4 3/4 1/8 5/8 3/8 7/8 1/16 9/16
0 1/2 3/4 1/4 5/8 1/8 3/8 7/8 15/16 7/16

The ¬rst row corresponds to the Van der Corput numbers and the second row
of obtained from the ¬rst by permuting the values with the same denominator.
The Faure sequence has better regularity properties than does the Halton
sequence above particularly in high dimensions. However the di¬erences are by
no means evident from a graph when the dimension is moderate. For example
we plot in Figure 6.6 the 14™th and 15™th coordinates of 1000 points from the
Faure sequence of dimension d = 15 for comparison with Figure 6.5.


Other suggestions for permuting the digits in a Halton sequence include
using only every l0 th term in the sequence so as to destroy the cycle.
330 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION




Figure 6.6: The last two coordinates of the ¬rst 1000 Faure points of dimension
d = 15.
EXAMPLES OF LOW DISCREPANCY SEQUENCES 331

In practice, in order to determine the e¬ect of using one of these low dis-
crepancy sequences we need only substitute such a sequence for the vector of
independent uniform random numbers used by a simulation. For example if we
wished to simulate a process for 10 time periods, then value a call option and
average the results, we could replace the 10 independent uniform random num-
bers that we used to generate one path by an element of the Halton sequence
with d = 10.

Suppose we return brie¬‚y to the call option example treated in Chapter 3.
The true value of this call option was around 0.4615 according to the Black-
Scholes formula. If however we substitute the Van der Corput sequence for the
sequence of uniform random numbers,

mean(fn(corput(100000,2)))

we obtain an estimate of 0.4614 very close to the correct value. I cannot
compare these estimators using the notion of e¬ciency that we used there, how-
ever, because these low-discrepancy sequences are not random and do not even
attempt to emulate random numbers. Though unable to compare performance
with the variance of an estimator, we can look at the Mean squared error (see
for example Figure 6.8). which shows a faster rate of convergence for Quasi
Monte Carlo equivalent to variance reduction in excess of 100). Galanti & Jung
(1997), report that the Faure sequence su¬ers from the problem of start-up
and especially in high-dimensions and the Faure numbers can exhibit clustering
about zero. In order to reduce this problem, Faure suggests discarding the ¬rst
b4 ’ 1 points.



Sobol Sequence

The Sobol sequence is generated using a set of so-called direction numbers
mi
= 1, 2, where the mi are odd positive integers less than 2i . The
vi = 2i , i

values of mi are chosen to satisfy a recurrence relation using the coe¬cients of
332 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

a primitive polynomial in the Galois Field of order 2. A primitive polynomial is
irreducible (i.e. cannot be factored into polynomials of smaller degree) and does
not divide the polynomial xr + 1 for r < 2p ’ 1. For example the polynomial
x2 + x + 1 has no non-trival factors over the Galois Field of order 2 and it does
divide x3 + 1 but not xr + 1 for r < 3. Corresponding to a primitive polynomial


z p + c1 z p’1 + ...cp’1 z + cp

is the recursion

mi = 2c1 mi’1 + 22 c2 mi’2 + ... + 2p cp mi’p

where the addition is carried out using binary arithmetic. For the Sobol se-
quence, we then replace the binary digit ak by ak vk .
In the case d = 2, the ¬rst 10 Sobol numbers are, using irreducible polyno-
mials x + 1 and x3 + x + 1

0 1/2 1/4 3/4 3/8 7/8 1/8 5/8 5/16 13/16
0 1/2 1/4 3/4 1/8 5/8 3/8 7/8 11/16 3/16

Again we plot the last two coordinates for the ¬rst 1000 points from a Sobol
sequence of dimension d = 15 in Figure 6.7 for comparison with Figures 6.5 and
6.6.



Although there is a great deal of literature espousing the use of one quasi-
Monte Carlo sequence over another, most results from a particular application
and there is not strong evidence at least that when the dimension of the problem
is moderate (for example d · 15) it makes a great deal of di¬erence whether we
use Halton, Faure or Sobol sequences. There is evidence that the starting values
for the Sobol sequences have an e¬ect on the speed of convergence, and that
Sobol sequences can be generated more quickly than Faure Moreover neither
the Faure nor Sobol sequence provides a “black-box” method because both are
EXAMPLES OF LOW DISCREPANCY SEQUENCES 333




Figure 6.7: The last two coordinates of the ¬rst 1000 points from a Sobel se-
quence of dimension 15
334 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

sensitive to intitialization. I will not attempt to adjudicate the considerable
literature on this topic here, but provide only a fragment of evidence that, at
least in the kind of example discussed in the variance reduction chapter, there
is little to choose between the various methods. Of course this integral, the
discounted payo¬ from a call option as a function of the uniform input, is a
one-dimensional integral so the Faure, Halton and Van der Corput sequences
are all the same thing in this case. In Figure 6.8 we plot the (expected) squared
error as a function of sample size for n = 1, ..., 100000 for crude Monte Carlo
( the dashed line) and the Van der Corput sequence. The latter, although it
oscillates somewhat, is substantially better at all sample sizes, and its mean
squared error is equivalent to a variance reduction of around 1000 by the time
we reach n = 100, 000. The di¬erent slope indicates an error approaching zero
at rate close to n’1 rather than the rate n’1/2 for the Crude Monte Carlo
estimator. The Sobol sequence, although highly more variable as a function
of sample size, appears to show even more rapid convergence along certain
subsequences.


The Sobol and Faure sequences are particular cases of (t, s) ’ nets. In order
to de¬ne then we need the concept of an elementary interval.




Elementary Intervals and Nets

De¬nition: elementary interval

An elementary interval in base b is n interval E in I s of the form

s
Y aj (aj + 1)
(6.10)
E= , ,
bdj bdj
j=1


with dj ≥ 0, 0 · aj · bdj and aj , dj are integers.
ELEMENTARY INTERVALS AND NETS 335




Figure 6.8: (Expected) squared error vs. sample size in the estimation of an
Call option price for Crude MC and Van der Corput sequence.
336 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

De¬nition: (t, m, s) - net

Let 0 · t · m be integers. A (t.m.s) - net in base b is a ¬nite sequence with
bm points from I s such that every elementary interval in base b of volume bt’m
contains exactly bt points of the sequence.


De¬nition: (t, s) - sequence

An in¬nite sequence of points {xi } ∈ I s is a (t,s)-sequence in base b if for all
k ≥ 0 and m > t, the ¬nite sequence xkbm , . . . , x(k+1)bm’1 forms a (t,m,s) - net
in base b.
It is known that for a (t, s)-sequence in base b the low discrepancy is ensured:


(log N )s (log N )s’1

DN · C (6.11)
+ O( ).
N N
Special constructions of such sequences for s ≥ 2 have the smallest discrep-
ancy that is currently known (H. Niederreiter, 1992, Random Number Genera-
tion and Quasi-Monte Carlo Methods). j
The thesis of K.S. Tan (1998) provides a thorough investigation into various
improvements in Quasi-Monte Carlo sampling, as well as the evidence of the high
e¬ciency of these methods when valuing Rainbow Options in high dimensions.
Papageorgiou and Traub (1996) tested what Tezuka called generalized Faure
points. They concluded that these points were superior to Sobol points for the
model problem. Particularly important for ¬nancial computation, a reasonably
small error could be achieved with few evaluations. For example, just 170
generalized Faure points were su¬cient to achieve an error of less than one part
in a hundred for a 360 dimensional problem. See also Traub and Wozniakowski
(1994) and Paskov and Traub (1995).
In summary, Quasi-Monte Carlo frequently generates estimates superior to
Monte-Carlo methods in many problems of low or intermediate e¬ective dimen-
sion. If the dimension d is large, but a small number of variables determine
PROBLEMS 337

most of the variability in the simulation, then we might expect Quasi Monte-
Carlo methods to continue to perform well. The price we pay for the smaller
error often associated with quasi Monte-Carlo methods and other numerical
techniques, one that often a consequence of any departure from a crude sim-
ulation of the process, is a feel for the distribution of various functionals of
the process of interest, as opposed to generating a single precise estimate. The
theory supporting low-discrepancy sequences, both the measures of discrepancy
themselves and the variation measure V (f ) are both tied, somewhat arti¬cially,
to the axes. For example if f (x) represents the indicator function of a square
with sides parallel to the axes in dimension d = 2, then V (f ) = 0. However, if
we rotate this rectangle by 45 degrees, the variation becomes in¬nite indicat-
ing that functions with steep isoclines at a 45 degree angle to the exes may be
particularly di¬cult to integrate using Quasi Monte Carlo.



Problems

1. Use 3-dimensional Halton sequences to integrate the function
Z 1Z 1Z 1
f (x, y, z)dxdydz
0 0 0

where f (x, y, z) = 1 if x < y < z and otherwise f (x, y, z) = 0. Compare
your answer with the true value of the integral and with crude Monte
Carlo integral of the same function.

2. Use your program from question 1 to generate 50 points uniformly distrib-
uted in the unit cube. Evaluate the Chi-squared statistic χ2 for a test
obs

that these points are independent uniform on the cube where we divide the
cube into 8 subcubes, eacc having sides of length 1/2. Carry out the test
by ¬nding P [χ2 > χ2 ] where χ2 is a random chi-squared variate with
obs

the appropriate number of degrees of freedom. This quantity P [χ2 > χ2 ]
obs

is usually referrred to as the “signi¬cance probability” or “p-value” for
338 CHAPTER 6. QUASI- MONTE CARLO MULTIPLE INTEGRATION

the test. If we suspected too much uniformity to be consistent with as-
sumption of independent uniform, we might use the other tail of the test,
i.e. evaluate P [χ2 < χ2 ]. Do so and comment on your results.
obs