. 1
( 4)


Chapter 3

Basic Monte Carlo Methods

Simulation and Monte Carlo Methods

Consider as an example the following very simple problem. We wish to price
a European call option with exercise price $22 and payo¬ function V (ST ) =
(ST ’22)+ . Assume for the present that the interest rate is 0% and ST can take
only the following ¬ve values with corresponding risk neutral (Q) probabilities

s 20 21 22 23 24
Q[ST = s] 1/16 4/16 6/16 4/16 1/16
In this case, since the distribution is very simple, we can price the call option

4 1 3
EQ V (ST ) = EQ (ST ’ 22)+ = (23 ’ 22) + (24 ’ 22) =.
16 16 8

However, the ability to value an option explicitly is a rare luxury. An alternative
would be to generate a large number (say n = 1000) independent simulations of
the stock price ST under the measure Q and average the returns from the option.
Say the simulations yielded values for ST of 22, 20, 23, 21, 22, 23, 20, 24, .... then


the estimated value of the option is

[(22 ’ 22)+ + (20 ’ 22)+ + (23 ’ 22)+ + ....].
V (ST ) =
= [0 + 0 + 1 + ....]

The law of large numbers assures us for a large number of simulations n, the
average V (ST ) will approximate the true expectation EQ V (ST ). Now while it
would be foolish to use simulation in a simple problem like this, there are many
models in which it is much easier to randomly generate values of the process ST
than it is to establish its exact distribution. In such a case, simulation is the
method of choice.
Randomly generating a value of ST for the discrete distribution above is easy,
provided that we can produce independent random uniform random numbers
on a computer. For example, if we were able to generate a random number Yi
which has a uniform distribution on the integers {0, 1, 2, ...., 15} then we could
de¬ne ST for the i0 th simulation as follows:
{0} {1, 2, 3, 4} {5, 6, 7, 8, 9, 10} {11, 12, 13, 14} {15}
If Yi is in set
de¬ne ST = 20 21 22 23 24
Of course, to get a reasonably accurate estimate of the price of a complex
derivative may well require a large number of simulations, but this is decreas-
ingly a problem with increasingly fast computer processors. The ¬rst ingredient
in a simulation is a stream of uniform random numbers Yi used above. In prac-
tice all other distributions are generated by processing discrete uniform random
numbers. Their generation is discussed in the next section.

Uniform Random Number Generation

The ¬rst requirement of a stochastic model is the ability to generate “random”
variables or something resembling them. Early such generators attached to
computers exploited physical phenomena such as the least signi¬cant digits in

an accurate measure of time, or the amount of background cosmic radiation
as the basis for such a generator, but these su¬er from a number of disadvan-
tages. They may well be “random” in some more general sense than are the
pseudo-random number generators that are presently used but their properties
are di¬cult to establish, and the sequences are impossible to reproduce. The
ability to reproduce a sequence of random numbers is important for debugging
a simulation program and for reducing its variance.

It is quite remarkable that some very simple recursion formulae de¬ne se-
quences that behave like sequences of independent random numbers and appear
to more or less obey the major laws of probability such as the law of large num-
bers, the central limit theorem, the Glivenko-Cantelli theorem, etc. Although
computer generated pseudo random numbers have become more and more like
independent random variables as the knowledge of these generators grows, the
main limit theorems in probability such as the law of large numbers and the
central limit theorem still do not have versions which directly apply to depen-
dent sequences such as those output by a random number generator. The fact
that certain pseudo-random sequences appear to share the properties of inde-
pendent sequences is still a matter of observation rather than proof, indicating
that many results in probability hold under much more general circumstances
than the relatively restrictive conditions under which these theorems have so far
been proven. One would intuitively expect an enormous di¬erence between the
behaviour of independent random variables Xn and a deterministic (i.e. non-
random) sequence satisfying a recursion of the form xn = g(xn’1 ) for a simple
function g. Surprisingly, for many carefully selected such functions g it is quite
di¬cult to determine the di¬erence between such a sequence and an indepen-
dent sequence. Of course, numbers generated from a simple recursion such as
this are neither random, nor are xn’1 and xn independent. We sometimes draw
attention to this by referring to such a sequence as pseudo-random numbers.
While they are in no case independent, we will nevertheless attempt to ¬nd

simple functions g which provide behaviour similar to that of independent uni-
form random numbers. The search for a satisfactory random number generator
is largely a search for a suitable function g, possibly depending on more than one
of the earlier terms of the sequence, which imitates in many di¬erent respects
the behaviour of independent observations with a speci¬ed distribution.

De¬nition: reduction modulo m. For positive integers x and m, the value
a mod m is the remainder (between 0 and m ’ 1 ) obtained when a is divided
by m. So for example 7 mod 3 = 1 since 7 = 2 — 3 + 1.

The single most common class of random number generators are of the form

xn := (axn’1 + c) mod m

for given integers a, c, and m which we select in advance. This generator is
initiated with a “seed” x0 and then run to produce a whole sequence of values.
When c = 0, these generators are referred to as multiplicative congruential
generators and in general as mixed or linear congruential generators. The
“seed”, x0 , is usually updated by the generator with each call to it. There are
two common choices of m, either m prime or m = 2k for some k (usually 31 for
32 bit machines).

Example: Mixed Congruential generator

De¬ne xn = (5xn’1 + 3) mod 8 and the seed x0 = 3. Note that by this

x1 = (5 — 3 + 3) mod 8 = 18 mod 8 = 2

x2 = 13 mod 8 = 5

x3 = 28 mod 8 = 4

and x4 , x5 , x6, x7 , x8 = 7, 6, 1, 0, 3 respectively

and after this point (for n > 8) the recursion will simply repeat again the
pattern already established, 3, 2, 5, 4, 7, 6, 1, 0, 3, 2, 5, 4, .......
The above repetition is inevitable for a linear congruential generator. There
are at most m possible numbers after reduction mod m and once we arrive
back at the seed the sequence is destined to repeat itself. In the example
above, the sequence cycles after 8 numbers. The length of one cycle, before the
sequence begins to repeat itself again, is called the period of the generator. For a
mixed generator, the period must be less than or equal to m. For multiplicative
generators, the period is shorter, and often considerably shorter.

Multiplicative Generators.

For multiplicative generators, c = 0. Consider for example the generator xn =
5xn’1 mod 8 and x0 = 3. This produces the sequence 3, 7, 3, 7, .... In this case,
the period is only 2, but for general m, it is clear that the maximum possible
period is m’1 because it generates values in the set {1, ..., m’1}. The generator
cannot generate the value 0 because if it did, all subsequent values generated
are identically 0. Therefore the maximum possible period corresponds to a cycle
through non-zero integers exactly once. But in the example above with m = 2k ,
the period is far from attaining its theoretical maximum, m ’ 1. The following
Theorem shows that the period of a multiplicative generator is maximal when
m is a prime number and a satis¬es some conditions.

Theorem 14 (period of multiplicative generator).
If m is prime, the multiplicative congruential generator xn = axn’1 (mod m), a 6=
if and only if ai 6= 1(mod m) for all i =
has maximal period m ’ 1
1, 2, ..., m ’ 1.

If m is a prime, and if the condition am’1 = 1(mod m) and ai 6= 1(mod m)
for all i < m ’ 1 holds, we say that a is a primitive root of m, which means

that the powers of a generate all of the possible elements of the multiplicative
group of integers mod m. Consider the multiplicative congruential generator
xn = 2xn’1 mod 11. It is easy to check that 2i mod 11 = 2, 4, 8, 5, 10, 9, 7, 3, 6, 1
as i = 1, 2, ...10. Since the value i = m’ 1 is the ¬rst for which 2i mod 11 = 1, 2
is a primitive root of 11 and this is a maximal period generator having period
10. When m = 11, only the values a = 2, 6, 7, 8 are primitive roots and produce
full period (10) generators.
One of the more common moduli on 32 bit machines is the Mersenne prime
m = 231 ’ 1. In this case, the following values of a (among many others) all
produce full period generators:

a = 7, 16807, 39373, 48271, 69621, 630360016, 742938285, 950706376,

1226874159, 62089911, 1343714438

Let us suppose now that m is prime and a2 is the multiplicative inverse
(mod m) of a1 by which we mean (a1 a2 ) mod m = 1. When m is prime,
the set of integers {0, 1, 2, ..., m ’ 1} together with the operations of addition
and multiplication mod m forms what is called a ¬nite ¬eld. This is a ¬nite
set of elements together with operations of addition and multiplication such as
those we enjoy in the real number system. For example for integers x1 , a1 , a2 ∈
{0, 1, 2, ..., m’1}, the product of a1 and x1 can be de¬ned as (a1 x1 ) mod m = x2,
say. Just as non-zero numbers in the real number system have multiplicative
inverses, so too do non=zero elements of this ¬eld. Suppose for example a2 is
the muultiplicative inverse of a1 so that a2 a1 mod m = 1. If we now multiply x2
by a2 we have

(a2 x2 ) mod m = (a2 a1 x1 ) mod m = (a2 a1 mod m)(x1 mod m) = x1 .

This shows that x1 = (a2 x2 ) mod m is equivalent to x2 = (a1 x1 ) mod m. In
other words, using a2 the multiplicative inverse of a1 mod m, the multiplicative
generator with multiplier a2 generates exactly the same sequence as that with

multiplier a1 except in reverse order. Of course if a is a primitive root of m,
then so is its multiplicative inverse.

Theorem 15 (Period of Multiplicative Generators with m = 2k )
If m = 2k with k ≥ 3, and if a mod 8 = 3 or 5 and x0 is odd, then the
multiplicative congruential generator has maximal period = 2k’2 .

For the proof of these results, see Ripley(1987), Chapter 2. The follow-
ing simple Matlab code allows us to compare linear congruential generators
with small values of m. It generates a total of n such values for user de¬ned
a, c, m, x0 =seed. The e¬cient implementation of a generator for large values of
m depends very much on the architecture of the computer. We normally choose
m to be close to the machine precision (e.g. 232 for a 32 bit machine.
function x=lcg(x0,a,c,m,n)

y=x0; x=x0;

for i=1:n ; y=rem(a*y+c,m); x=[x y]; end

The period of a linear congruential generator varies both with the multiplier
a and the constant c. For example consider the generator

xn = (axn’1 + 1) mod 210

for various multipliers a. When we use an even multiplier such as a = 2, 4, ...(using
seed 1) we end up with a sequence that eventually locks into a speci¬c value.
For example with a = 8 we obtain the sequence 1,9,73,585,585,....never changing
beyond that point. The periods for odd multipliers are listed below (all started
with seed 1)
1 3 5 7 9 11 13 15 17 19 21 23 25
Period 1024 512 1024 256 1024 512 1024 128 1024 512 1024 256 1024
The astute reader will notice that the only full-period multipliers a are those
which are multipliers of 4. This is a special case of the following theorem.

Theorem 16 (Period of Mixed or Linear Congruential Generators.)
The Mixed Congruential Generator,

xn = (axn’1 + c) mod m

has full period m if and only if
(i) c and m are relatively prime.
(ii) Each prime factor of m is also a factor of a ’ 1.
(iii) If 4 divides m it also divides a ’ 1.

When m is prime, (ii) together with the assumption that a < m implies that
m must divide a ’ 1 which implies a = 1. So for prime m the only full-period
generators correspond to a = 1. Prime numbers m are desirable for long periods
in the case of multiplicative generators, but in the case of mixed congruential
generators, only the trivial one xn = (xn’1 + c)(mod m) has maximal period
m when m is prime. This covers the popular Mersenne prime m = 231 ’ 1..

For the generators xn = (axn’1 + c) mod 2k where m = 2k , k ≥ 2, the
condition for full period 2k requires that c is odd, and a = 4j + 1 for some
integer j.

Some of the linear or multiplicative generators which have been suggested
are the following:

m a c
231 ’ 1 75 = 16807 0 Lewis,Goodman, Miller (1969)IBM,
231 ’ 1 0 Fishman (Simscript II)
231 ’ 1 0 Fishman and Moore
231 0 RANDU
232 1 Super-Duper (Marsaglia)
232 0 Fishman and Moore
232 1 DERIVE
232 0 Ahrens (C-RAND )
232 Turbo-Pascal,Version 7(period= 232 )
235 513 0 APPLE
1012 ’ 11 0 MAPLE
259 1313 0 NAG
261 ’ 1 220 ’ 219 0 Wu (1997)
Table 3.1: Some Suggested Linear and Multiplicative Random
Number Generators

Other Random Number Generators.

A generalization of the linear congruential generators which use a k-dimensional
vectors X has been considered, speci¬cally when we wish to generate correlation
among the components of X. Suppose the components of X are to be integers
between 0 and m’ 1 where m is a power of a prime number. If A is an arbitrary
k — k matrix with integral elements also in the range {0, 1, ..., m ’ 1} then we
begin with a vector-valued seed X0 , a constant vector C and de¬ne recursively

Xn := (AXn’1 + C) mod m

Such generators are more common when C is the zero vector and called matrix
multiplicative congruential generators. A related idea is to use a higher order

recursion like

xn = (a1 xn’1 + a2 xn’2 + .. + ak xn’k ) mod m,

called a multiple recursive generator. L™Ecuyer (1996,1999) combines a number
of such generators in order to achieve a period around 2319 and good uniformity
properties. When a recursion such as (3.2) with m = 2 is used to generate
pseudo-random bits {0, 1}, and these bits are then mapped into uniform (0,1)
numbers, it is called a Tausworthe or Feedback Shift Register generators. The
coe¬cients ai are determined from primitive polynomials over the Galois Field.

In some cases, the uniform random number generator in proprietary packages
such as Splus and Matlab are not completely described in the package documen-
tation. This is a further recommendation of the transparency of packages like
R. Evidently in Splus, the multiplicative congruential generator is used, and
then the sequence is “ shu¬„ed” using a Shift-register generator (a special case
of the matrix congruential generator described above). This secondary process-
ing of the sequence can increase the period but it is not always clear what other
e¬ects it has. In general, shu¬„ing is conducted according to the following steps

1. Generate a sequence of pseudo-random numbers xi using xi+1 = a1 xi (mod m1 )

2. For ¬xed k put (T1 , . . . , Tk ) = (x1 , . . . , xk ).

3. Generate, using a di¬erent generator, a sequence yi+1 = a2 yi (mod m2 ).

4. Output the random number TI where I = dYi k/m2 e .

5. Increment i, replace TI by the next value of x, and return to step 3.

One generator is used to produce the sequence x, numbers needed to ¬ll
k holes. The other generator is then used select which hole to draw the next
number from or to “shu¬„e” the x sequence.

Example: A shu¬„ed generator

Consider a generator described by the above steps with k = 4, xn+1 = (5xn )(mod1 19)
and yn+1 = (5yn )(mod 29)
3 15 18 14 13 8 2
xn =
yn = 3 15 17 27 19 8 11
We start by ¬lling four pigeon-holes with the numbers produced by the ¬rst
generator so that (T1 , . . . , T4 ) = (3, 15, 18, 14). Then use the second generator
to select a random index I telling us which pigeon-hole to draw the next number
from. Since these holes are numbered from 1 through 4, we use I = d4— 3/29e =
1. Then the ¬rst number in our random sequence is drawn from box 1, i.e.
z1 = T1 = 3, so z1 = 3. This element T1 is now replaced by 13, the next number
in the x sequence. Proceeding in this way, the next index is I = d4— 15/29e = 3
and so the next number drawn is z2 = T3 = 18. Of course, when we have
¬nished generating the values z1 , z2 , ... all of which lie between 1 and m1 = 18,
we will usually transform them in the usual way (e.g. zi /m1 ) to produce
something approximating continuous uniform random numbers on [0,1]. When
m1 , is large, it is reasonable to expect the values zi /m1 to be approximately
continuous and uniform on the interval [0, 1]. One advantage of shu¬„ing is that
the period of the generator is usually greatly extended. Whereas the original
x sequence had period 9 in this example, the shu¬„ed generator has a larger
period or around 126.

There is another approach, summing pseudo-random numbers, which is also
used to extend the period of a generator. This is based on the following theo-
rem (see L™Ecuyer (1988)). For further discussion of the e¬ect of taking linear
combinations of the output from two or more random number generators, see
Fishman (1995, Section 7.13).

Theorem 17 (Summing mod m)
If X is random variable uniform on the integers {0, . . . , m ’ 1} and if Y is

any integer-valued random variable independent of X, then the random variable
W = (X + Y )(mod m) is uniform on the integers {0, . . . , m ’ 1}.

Theorem 18 (Period of generator summed mod m1 )
If xi+1 = a1 xi mod m1 has period m1 ’ 1 and yi+1 = a2 yi mod m2 has period
m2 ’ 1, then (xi + yi )(mod m1 ) has period the least common multiple of (m1 ’
1, m2 ’ 1).

Example: summing two generators

If xi+1 = 16807xi mod(231 ’ 1) and yi+1 = 40692yi mod(231 ’ 249), then the
period of (xi + yi )mod(231 ’ 1) is

(231 ’ 2)(231 ’ 250)
≈ 7.4 — 1016
2 — 31

This is much greater than the period of either of the two constituent generators.

Other generators.

One such generator, the “Mersenne-Twister”, from Matsumoto and Nishimura
(1998) has been implemented in R and has a period of 219937 ’ 1. Others use a
non-linear function g in the recursion xn+1 = g(xn )(mod m) to replace a linear
one For example we might de¬ne xn+1 = x2 (mod m) (called a quadratic residue

generator) or xn+1 = g(xn ) mod m for a quadratic function or some other non-
linear function g. Typically the function g is designed to result in large values
and thus more or less random low order bits. Inversive congruential generators
generate xn+1 using the (mod m) inverse of xn .
Other generators which have been implemented in R include: the Wichmann-
Hill (1982,1984) generator which uses three multiplicative generators with prime
moduli 30269, 30307, 30323 and has a period of 1 (30268 — 30306 — 30322). The

outputs from these three generators are converted to [0, 1] and then summed
mod 1. This is similar to the idea of Theorem 17, but the addition takes

place after the output is converted to [0,1]. See Applied Statistics (1984), 33,
123. Also implemented are Marsaglia™s Multicarry generator which has a pe-
riod of more than 260 and reportedly passed all tests (according to Marsaglia),
Marsaglia™s ”Super-Duper”, a linear congruential generator listed in Table 1,
and two generators developed by Knuth (1997,2002) the Knuth-TAOCP and

Conversion to Uniform (0, 1) generators:

In general, random integers should be mapped into the unit interval in such a
way that the values 0 and 1, each of which have probability 0 for a continuous
distribution are avoided. For a multiplicative generator, since values lie between
1 and m’1, we may divide the random number by m. For a linear congruential
generator taking possible values x ∈ {0, 1, ..., m ’ 1}, it is suggested that we use
(x + 0.5)/m.

Apparent Randomness of Pseudo-Random Num-
ber Generators

Knowing whether a sequence behaves in all respects like independent uniform
random variables is, for the statistician, pretty close to knowing the meaning of
life. At the very least, in order that one of the above generators be reasonable
approximations to independent uniform variates it should satisfy a number of
statistical tests. Suppose we reduce the uniform numbers on {0, 1, ..., m ’ 1}
to values approximately uniformly distributed on the unit interval [0, 1] as de-
scribed above either by dividing through by m or using (x + 0.5)/m. There are
many tests that can be applied to determine whether the hypothesis of inde-
pendent uniform variates is credible (not, of course, whether the hypothesis is
true. We know by the nature of all of these pseudo-random number generators

that it is not!).

Runs Test

We wish to test the hypothesis H0 that a sequence{Ui , i = 1, 2, ..., n} consists
of n independent identically distributed random variables under the assump-
tion that they have a continuous distribution. The runs test measures runs,
either in the original sequence or in its di¬erences. For example, suppose we
denote a positive di¬erence between consecutive elements of the sequence by +
and a negative di¬erence by ’. Then we may regard a sequence of the form
.21, .24, .34, .37, .41, .49, .56, .51, .21, .25, .28, .56, .92,.96 as unlikely under inde-
pendence because the corresponding di¬erences + + + + + + ’ ’ + + + + + have
too few “runs” (the number of runs here is R = 3). Under the assumption that
the sequence {Ui , i = 1, 2, ..., n} is independent and continuous, it is possible
2n’1 3n’5
to show that E(R) = 3 and The proof of this result is a
var(R) = 18 .

problem at the end of this chapter. We may also approximate the distribution of
R with the normal distribution for n ≥ 25. A test at a 0.1% level of signi¬cance
is therefore: reject the hypothesis of independence if
¯ ¯
¯ ¯
¯ R ’ 2n’1 ¯
¯ q 3 ¯ > 3.29,
¯ ¯
¯ 3n’5 ¯

where 3.29 is the corresponding N (0, 1) quantile. A more powerful test based
on runs compares the lengths of the runs of various lengths (in this case one
run up of length 7, one run down of length 3, and one run up of length 6) with
their theoretical distribution.

Another test of independence is the serial correlation test. The runs test
above is one way of checking that the pairs (Un , Un+1 )are approximately uni-
formly distributed on the unit square. This could obviously be generalized to
pairs like (Ui , Ui+j ). One could also use the sample correlation or covariance as

the basis for such a test. For example, for j ≥ 0,

Cj = (U1 U1+j + U2 U2+j + ..Un’j Un + Un+1’j U1 + .... + Un Uj )

The test may be based on the normal approximation to the distribution of Cj
with mean E(C0 ) = 1/3 and E(Cj ) = 1/4 for j ≥ 1. Also
⎪ 4
⎪ for j=0
⎪ 45n

13 n
var(Cj ) = j ≥ 1, j 6=
⎪ 144n 2

© 7 n
for j=
72n 2

Such a test, again at a 0.1% level will take the form: reject the hypothesis of
independent uniform if
¯ ¯
¯ 1¯
¯ Cj ’ 4 ¯
¯q ¯ > 3.29.
¯ ¯
¯ 13 ¯

for a particular preselected value of j (usually chosen to be small, such as j =
1, ...10).

Chi-squared test.

The chi-squared test can be applied to the sequence in any dimension, for ex-
ample k = 2. Suppose we have used a generator to produce a sequence of
uniform(0, 1) variables, Uj , j = 1, 2, ...2n, and then, for a partition {Ai ; i =
1, ..., K} of the unit square, we count Ni , the number of pairs of the form
(U2j’1 , U2j ) ∈ Ai . See for example the points plotted in Figure 3.1. Clearly
this should be related to the area or probability P (Ai ) of the set Ai . Pearson™s
chi-squared statistic is
X [Ni ’ nP (Ai )]2
nP (Ai )

which should be compared with a chi-squared distribution with degrees of free-
dom K ’ 1 or one less than the number of sets in the partition. Observed values




2 A



0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 3.1: The Chi-squared Test

of the statistic that are unusually large for this distribution should lead to re-
jection of the uniformity hypothesis. The partition usually consists of squares
of identical area but could, in general, be of arbitrary shape.

Spectral Test

Consecutive values plotted as pairs (xn , xn+1 ), when generated from a multi-
plicative congruential generator xn+1 = axn (mod m) fall on a lattice. A lattice is
a set of points of the form t1 e1 +t2 e2 where t1 , t2 range over all integers and e1 , e2
are vectors, (here two dimensional vectors since we are viewing these points in
pairs of consecutive values (xn , xn+1 )) called the “basis” for the lattice. A given
lattice, however, has many possible di¬erent bases, and in order to analyze the
lattice structure, we need to isolate the most “natural” basis, e.g. the one that
we tend to see in viewing a lattice in two dimensions. Consider, for example, the
lattice formed by the generator xn = 23xn’1 mod 97. A plot of adjacent pairs









e1 O

0 10 20 30 40 50 60 70 80 90 100

Figure 3.2: The Spectral Test

(xn , xn+1 ) is given in Figure 3.2. For basis vectors we could use e1 = (1, 23) and
e2 = (4, ’6), or we could replace e1 by (5, 18)or (9, 13) etc. Beginning at an
arbitrary point O on the lattice as origin (in this case, since the original point
(0,0) is on the lattice, we will leave it unchanged), we choose an unambiguous
de¬nition of e1 to be the shortest vector in the lattice, and then de¬ne e2 as the
shortest vector in the lattice which is not of the form te1 for integer t. Such a
basis will be called a natural basis. The best generators are those for which the
cells in the lattice generated by the 2 basis vectors e1 , e2 or the parallelograms
with sides parallel to e1 , e2 are as close as possible to squares so that e1 and
e2 are approximately the same length. As we change the multiplier a in such a
way that the random number generator still has period ' m, there are roughly
m points in a region above with area approximately m2 and so the area of a
parallelogram with sides e1 and e2 is approximately a constant (m) whatever

the multiplier a. In other words a longer e1 is associated with a shorter vector
e2 and therefore for an ideal generator, the two vectors of reasonably similar
length. A poor generator corresponds to a basis with e2 much longer than e1 .
The spectral test statistic ν is the renormalized length of the ¬rst basis vector
||e1 ||. The extension to a lattice in k-dimensions is done similarly. All linear
congruential random number generators result in points which when plotted as
consecutive k-tuples lie on a lattice. In general, for k consecutive points, the
spectral test statistic is equal to min(b2 + b2 + . + . + . + b2 )1/2 under the con-
1 2 k

straint b1 + b2 a + ...bk ak’1 = mq, q 6= 0. Large values of the statistic indicate
that the generator is adequate and Knuth suggests as a minimum threshold the
value π’1/2 [(k/2)!m/10]1/k .
One of the generators that fails the spectral test most spectacularly with
k = 3 is the generator RANDU, xn+1 = 65539 xn (mod 231 ). This was used
commonly in simulations until the 1980™s and is now notorious for the fact
that a small number of hyperplanes ¬t through all of the points (see Marsaglia,
1968). For RANDU, successive triplets tend to remain on the plane xn =
6xn’1 ’ 9xn’2 . This may be seen by rotating the 3-dimensional graph of the
sequence of triplets of the form {(xn’2 , xn’1 , xn ); n = 2, 3, 4, ...N } as in Figure

As another example, in Figure 3.4 we plot 5000 consecutive triplets from
a linear congruential random number generator with a = 383, c = 263, m =
10, 000.
Linear planes are evident from some angles in this view, but not from others.
In many problems, particularly ones in which random numbers are processed in
groups of three or more, this phenomenon can lead to highly misleading results.
The spectral test is the most widely used test which attempts to insure against
lattice structure. tABLE 3.2 below is taken from Fishman(1996) and gives some
values of the spectral test statistic for some linear congruential random number






0.8 1
0.6 0.8
0 0

Figure 3.3: Lattice Structure of Uniform Random Numbers generated from

3d plot for linear congruential generator,a=383,c=263,m=10000






8000 10000
6000 8000
0 0

Figure 3.4: The values (xi , xi+1 , xi+2 ) generated by a linear congruential gen-
erator xn+1 = (383xn + 263)(mod 10000)

generators in dimension k · 7.

m a c k=2 k=3 k=4 k=5 k=6 k=7

231 ’ 1 75 0 0.34 0.44 0.58 0.74 0.65 0.57

231 ’ 1 0 0.82 0.43 0.78 0.80 0.57 0.68

231 ’ 1 0 0.87 0.86 0.86 0.83 0.83 0.62

0.01 0.06
231 65539 0 0.93 0.16 0.29 0.45

232 69069 0 0.46 0.31 0.46 0.55 0.38 0.50

232 0 0.87 0.83 0.83 0.84 0.82 0.72

232 663608941 0 0.88 0.60 0.80 0.64 0.68 0.61

235 513 0 0.47 0.37 0.64 0.61 0.74 0.68

259 1313 0 0.84 0.73 0.74 0.58 0.64 0.52
TABLE 3.2. Selected Spectral Test Statistics

The unacceptably small values for RANDU in the case k = 3 and k = 4 are
highlighted. On the basis of these values of the spectral test, the multiplicative

xn+1 = 742938285xn (mod 231 ’ 1)

xn+1 = 3934873077xn (mod 232 )

seem to be recommended since their test statistics are all reasonably large for
k = 2, ..., 7.

Generating Random Numbers from Non-Uniform
Continuous Distributions

By far the simplest and most common method for generating non-uniform vari-
ates is based on the inverse cumulative distribution function. For arbitrary

cumulative distribution function F (x), de¬ne F ’1 (y) = min{x; F (x) ≥ y}.
This de¬nes a pseudo-inverse function which is a real inverse (i.e. F (F ’1 (y)) =
F ’1 (F (y)) = y) only in the case that the cumulative distribution function is con-
tinuous and strictly increasing. However, in the general case of a possibly discon-
tinuous non-decreasing cumulative distribution function the function continues
to enjoy some of the properties of an inverse. Notice that F ’1 (F (x)) · x and
F (F ’1 (y)) ≥ y but F ’1 (F (F ’1 (y))) = F ’1 (y) and F (F ’1 (F (x))) = F (x). In
the general case, when this pseudo-inverse is easily obtained, we may use the
following to generate a random variable with cumulative distribution function
F (x).

Theorem 19 (inverse transform) If F is an arbitrary cumulative distribution
function and U is uniform[0, 1] then X = F ’1 (U ) has cumulative distribution
function F (x).

Proof. The proof is a simple consequence of the fact that

[U < F (x)] ‚ [X · x] ‚ [U · F (x)] for all x, (3.5)

evident from Figure 3.5. Taking probabilities throughout (3.5), and using the
continuity of the distribution of U so that P [U = F (x)] = 0, we obtain

F (x) · P [X · x] · F (x).

Examples of Inverse Transform

Exponential (θ)

This distribution, a special case of the gamma distributions, is common in most
applications of probability. For example in risk management, it is common to
model the time between defaults on a contract as exponential (so the default

Figure 3.5: The Inverse Transform generator

times follow a Poisson process). In this case the probability density function is
f (x) = 1 e’x/θ , x ≥ 0 and f (x) = 0 for x < 0. The cumulative distribution

function is F (x) = 1 ’ e’x/θ , x ≥ 0. Then taking its inverse,

X = ’θ ln(1 ’ U ) or equivalently

X = ’θ ln U since U and 1 ’ U have the same distribution.

In Matlab, the exponential random number generators is called exprnd and in
Splus or R it is rexp.

Cauchy (a, b)

This distribution is a member of the stable family of distributions which we
discuss later. It is similar to the normal only substantially more peaked in
the center and with more area in the extreme tails of the distribution. The
probability density function is
, ’∞ < x < ∞.
f (x) =
π(b2 + (x ’ a)2 )
See the comparison of the probability density functions in Figure 3.6. Here
we have chosen the second (scale) parameter b for the Cauchy so that the two

Figure 3.6: The Normal and the Cauchy Probability Density Functions

densities would match at the point x = a = 0.
1 1
arctan( x’a ). Then
The cumulative distribution function is F (x) = +
2 π b

the inverse transform generator is, for U uniform on [0,1],

1 b
X = a + b tan{π(U ’ )} or equivalently X = a +
2 tan(πU )

where the second expression follows from the fact that tan(π(x’ 1 )) = (tan πx)’1 .

Geometric (p)

This is a discrete distribution which describes the number of (independent) trials
necessary to achieve a single success when the probability of a success on each
trial is p. The probability function is

f (x) = p(1 ’ p)x , x = 1, 2, 3, ....

and the cumulative distribution function is

F (x) = P [X · x] = 1 ’ (1 ’ p)[x] , x ≥ 0

where [x] denotes the integer part of x. To invert the cumulative distribution
function of a discrete distribution like this one, we need to refer to a graph of the
cumulative distribution function analogous to Figure 3.5. We wish to output
an integer value of x which satis¬es the inequalities

F (x ’ 1) < U · F (x).

Solving these inequalities for integer x,we obtain

1 ’ (1 ’ p)x’1 < U · 1 ’ (1 ’ p)x

(1 ’ p)x’1 > 1 ’ U ≥ (1 ’ p)x

(x ’ 1) ln(1 ’ p) > ln(1 ’ U ) ≥ x ln(1 ’ p)
ln(1 ’ U )
(x ’ 1) < ·x
ln(1 ’ p)

Note that changes of direction of the inequality occurred each time we multiplied
or divided by negative quantity. We should therefore choose the smallest integer
for X which is greater than or equal to or equivalently,

log(1 ’ U ) ’E
] or1 + [
X =1+[ ]
log(1 ’ p) log(1 ’ p)

where we write ’ log(1’U ) = E, an exponential(1) random variable. In Matlab,
the geometric random number generators is called geornd and in R or Splus
it is called rgeom.

Pareto (a, b)

This is one of the simpler families of distributions used in econometrics for
modeling quantities with lower bound b.

F (x) = 1 ’ ( ) , for x ≥ b > 0.

Then the probability density function is

f (x) =

and the mean is E(X) = . The inverse transform in this case results in

b b
(1 ’ U )1/a U 1/a

The special case b = 1 is often considered in which case the cumulative distrib-
ution function takes the form

F (x) = 1 ’

and the inverse
X = (1 ’ U )1/a .


This is again a distribution with shape similar to the normal but closer than is
the Cauchy. Indeed as can be seen in Figure 3.7, the two densities are almost
indistinguishable, except that the logistic is very slightly more peaked in the
center and has slightly more weight in the tails. Again in this graph, parameters
have been chosen so that the densities match at the center.
The logistic cumulative distribution function is

F (x) = .
1 + exp{’(x ’ a)/b}

and on taking its inverse, the logistic generator is

X = a + b ln(U/(1 ’ U )).

Extreme Value

This is one of three possible distributions for modelling extreme statistics such as
the largest observation in a very large random sample. As a result it is relevant

Figure 3.7: Comparison of the Standard Normal and Logistic(0.625) Probability
density functions.

to risk management. The cumulative distribution function is for parameters
’∞ < a < ∞ and b > 0,
F (x) = 1 ’ exp{’ exp( )}.
The corresponding inverse is

X = a + b ln(ln(U )).

Weibull Distribution

In this case the parameters a, b are both positive and the cumulative distribution
function is
F (x) = 1 ’ exp{’axb } for x ≥ 0.

The corresponding probability density function is

f (x) = abxb’1 exp{’axb }.

Then using inverse transform we may generate X as

½ ¾1/b
’ ln(1 ’ U )
X= .

Student™s t.

The Student t distribution is used to construct con¬dence intervals and tests for
the mean of normal populations. It also serves as a wider-tailed alternative to
the normal, useful for modelling returns which have moderately large outliers.
The probability density function takes the form

x2 ’(v+1)/2
“((v + 1)/2)
f (x) = √ , ’∞ < x < ∞.
(1 + )

The case v = 1 corresponds to the Cauchy distribution. There are specialized
methods of generating random variables with the Student t distribution we will
return to later. In MATLAB, the student™s t generator is called trnd. In general,
trnd(v,m,n) generates an m — n matrix of student™s t random variables having
v degrees of freedom.

The generators of certain distributions are as described below. In each case
a vector of length n with the associated parameter values is generated.


rnorm(n, µ, σ) normrnd(µ, σ, 1, n) or randn(1, n) if µ = 1, σ = 1

rt(n, ν) trnd(ν, 1, n)
Student™s t

rexp(n, ») exprnd(», 1, n)

runif(n, a, b) unifrnd(a, b, 1, n) or rand(1, n) if a = 0, b = 1

rweibull(n, a, b) weibrnd(a, b, 1, n)

rgamma(n, a, b) gamrnd(a, b, 1, n)

rcauchy(n, a, b) a+b*trnd(1, 1, n)

rbinom(n, m, p) binornd(m, p, 1, n)

rpois(n, ») poissrnd(», 1, n)
TABLE: Some Random Number Generators in R,SPLUS and MATLAB

Inversion performs reasonably well for any distribution for which both the
cumulative distribution function and its inverse can be found in closed form
and computed reasonably e¬ciently. This includes the Weibull, the logistic
distribution and most discrete distributions with a small number possible val-
ues. However, for other distributions such as the Normal, Student™s t, the
chi-squared, the Poisson or Binomial with large parameter values, other more
specialized methods are usually used, some of which we discuss later.
When the cumulative distribution function is known but not easily inverted,
we might attempt to invert it by numerical methods. For example, using the
Newton-Ralphson method, we would iterate until convergence the equation
F (X) ’ U
X=X’ (3.6)
f (X)
with f (X) = F 0 (X), beginning with a good approximation to X. For example
we might choose the initial value of X = X(U ) by using an easily inverted
approximation to the true function F (X). The disadvantage of this approach
is that for each X generated, we require an iterative solution to an equation
and this is computationally very expensive.

The Acceptance-Rejection Method

Suppose F (x) is a cumulative distribution function and f (x) is the corresponding
probability density function. In this case F is continuous and strictly increasing
wherever f is positive and so it has a well-de¬ned inverse F ’1 . Consider the
transformation of a point (u, v) in the unit square de¬ned by

x(u, v) = F ’1 (u)

y(u, v) = vf (F ’1 (u)) = vf (x)

for 0 < u < 1, 0<v<1

This maps a random point (U, V ) uniformly distributed on the unit square into
a point (X, Y ) uniformly distributed under the graph of the probability density
f . The fact that X has cumulative distribution function F follows from its def-
inition as X = F ’1 (U ) and the inverse transform theorem. By the de¬nition of
Y = V f (X) with V uniform on [0, 1] we see that the conditional distribution of
Y given the value of X, is uniform on the interval [0, f (X)]. Suppose we seek a
random number generator for the distribution of X but we are unable to easily
invert the cumulative distribution function We can nevertheless use the result
that the point (X, Y ) is uniform under the density as the basis for one of the
simplest yet most useful methods of generating non-uniform variates, the rejec-
tion or acceptance-rejection method. It is based on the following very simple
relationship governing random points under probability density functions.

Theorem 20 (Acceptance-Rejection) (X, Y ) is uniformly distributed in the
region between the probability density function y = f (x) and the axis y = 0 if
and only if the marginal distribution of X has density f (x) and the conditional
distribution of Y given X is uniform on [0, f (X)].

Proof. If a point (X, Y ) is uniformly distributed under the graph of f (x)
notice that the probability P [a < X < b] is proportional to the area under the

graph between vertical lines at x = a and x = b. In other words P [a < X <
b] is proportional to a f (x)dx. This implies that f (x) is proportional to the
probability density function of X and provided that ’∞ f (x)dx = 1, f (x) is
the probability density function of X. The converse and the rest of the proof is
Even if the scaling constant for a probability density function is unavailable,
in other words if we know f (x) only up to some unknown scale multiple, we
can still use Theorem 19 to generate a random variable with probability density
f because the X coordinate of a random point uniform under the graph of
a constant— f (x) is the same as that of a random point uniformly distributed
under the graph of f (x). The acceptance-rejection method works as follows. We
wish to generate a random variable from the probability density function f (x).
We need the following ingredients:

• A probability density function g(x) with the properties that

1. the corresponding cumulative distribution function G(x) = g(z)dz

is easily inverted to obtain G’1 (u).

f (x)
; ’∞ < x < ∞} < ∞. (3.7)

For reasonable e¬ciency we would like the supremum in (3.7) to be as close
as possible to one (it is always greater or equal to one).

The condition (3.7) allows us to ¬nd a constant c > 1 such that f (x) ·
cg(x) for all x. Suppose we are able to generate a point (X, Y ) uniformly
distributed under the graph of cg(x). This is easy to do using Theorem
19. Indeed we can de¬ne X = G’1 (U ) and Y = V — cg(X) where U
and V are independent U [0, 1]. Can we now ¬nd a point (X, Y ) which is
uniformly distributed under the graph of f (x)? Since this is a subset of
the original region, this is easy. We simple test the point we have already

Figure 3.8: The acceptance-Rejection Method

generated to see if it is in this smaller region and if so we use it. If not start
over generating a new pair (X, Y ), and repeating this until the condition
Y · f (X) is eventually satis¬ed, (see Figure ??).The simplest version of
this algorithm corresponds to the case when g(x) is a uniform density on
an interval [a, b]. In algorithmic form, the acceptance-rejection method is;

1. Generate a random variables X = G’1 (U ), where U where U is uniform
on [0, 1].

2. Generate independent V ∼ U [0, 1]

f (X)
3. If V · then return X and exit
cg(X) ,

4. ELSE go to step 1.

Figure 3.9: T (x, y) is an area Preserving invertible map f (x, y) from the region
under the graph of f into the set A, a subset of a rectangle.

The rejection method is useful if the density g is considerably simpler than
f both to evaluate and to generate distributions from and if the constant c is
close to 1. The number of iterations through the above loop until we exit at
step 3 has a geometric distribution with parameter p = 1/c and mean c so when
c is large, the rejection method is not very e¬ective.

Most schemes for generating non-uniform variates are based on a transfor-
mation of uniform with or without some rejection step. The rejection algorithm
is a special case. Suppose, for example, that T = (u(x, y), v(x, y)) is a one-one
area-preserving transformation of the region ’∞ < x < ∞, 0 < y < f (x) into a
subset A of a square in R2 as is shown in Figure 3.9.

Notice that any such transformation de¬nes a random number generator for
the density f (x). We need only generate a point (U, V ) uniformly distributed
in the set A by acceptance-rejection and then apply the inverse transformation
T ’1 to this point, de¬ning (X, Y ) = T ’1 (U, V ). Since the transformation is
area-preserving, the point (X, Y ) is uniformly distributed under the probability

density function f (x) and so the ¬rst coordinate X will then have density f . We
can think of inversion as a mapping on [0, 1] and acceptance-rejection algorithms
as an area preserving mapping on [0, 1]2 .
The most common distribution required for simulations in ¬nance and else-
where is the normal distribution. The following theorem provides the simple
connections between the normal distribution in Cartesian and in polar coordi-

Theorem 21 If (X, Y ) are independent standard normal variates, then ex-
pressed in polar coordinates,
(R, ˜) = ( X 2 + Y 2 , arctan(Y /X)) (3.8)

X 2 + Y 2 has the distribution of the
are independent random variables. R =
square root of a chi-squared(2) or exponential(2) variable. ˜ = arctan(Y /X))
has the uniform distribution on [0, 2π].

It is easy to show that if (X, Y ) are independent standard normal variates,

then X 2 + Y 2 has the distribution of the square root of a chi-squared(2) (i.e.
exponential(2)) variable and arctan(Y /X))is uniform on [0, 2π]. The proof of
this result is left as a problem.
This observation is the basis of two related popular normal pseudo-random
number generators. The Box-Muller algorithm uses two uniform[0, 1] variates
U, V to generate R and ˜ with the above distributions as

R = {’2 ln(U )}1/2 , ˜ = 2πV (3.9)

and then de¬nes two independent normal(0,1) variates as

(X, Y ) = R(cos ˜, sin ˜)

Note that normal variates must be generated in pairs, which makes simulations
involving an even number of normal variates convenient. If an odd number are
required, we will generate one more than required and discard one.

Theorem 22 (Box-Muller Normal Random Number generator)

Suppose (R, ˜) are independent random variables such that R2 has an ex-
ponential distribution with mean 2 and ˜ has a Uniform[0, 2π] distribution.
Then (X, Y ) = (R cos ˜, R sin ˜) is distributed as a pair of independent normal
Proof. Since R2 has an exponential distribution, R has probability density
P [R · r]
fR (r) =
P [R2 · r2 ]
d 2
(1 ’ e’r /2 )

. 1
( 4)