<< ńņš. 3(āńåćī 4)ŃĪÄÅŠĘĄĶČÅ >>
2Ļc c 2t 2 c
0
p
p K (Ī± Ī“ 2 + (x ā’ Āµ)2 )
Ī±Ī“ 2 ā’ Ī² 2 + Ī²(x ā’ Āµ)) 1 p
= exp(Ī“ Ī± .
Ļ Ī“ 2 + (x ā’ Āµ)2
162 CHAPTER 3. BASIC MONTE CARLO METHODS

The modiļ¬ed Bessel function of the second kind KĪ» (x) is given in MATLAB
by besselk( Ī½, x) and in R by besselK(x,Ī½,expon.scaled=FALSE). The distri-
bution with probability density function given by (3.22) is called the normal
inverse Gaussian distribution with real-valued parameters x, Āµ, 0 Ā· Ī“ and
Ī± ā„ |Ī²|. The tails of the normal inverse Gaussian density are substantially
heavier than those of the normal distribution. In fact up to a constant

f (x; Ī±, Ī², Ī“, Āµ) ā¼ |x| ā’3/2 exp((ā“Ī± + Ī²)x) as x ā’ Ā±ā.

The moments of this distribution can be obtained from the moment gener-
ating function
Āø1/4
Ī±2 ā’ (Ī² + s)2
M (s) = eĀµs exp{Ī“(Ī±2 ā’Ī² 2 )1/2 ā’Ī“(Ī±2 ā’(s+Ī²)2 )1/2 } for |Ī²+s| < Ī±.
Ī±2 ā’ Ī² 2
(3.23)
These moments are:

E(X) = Āµ + Ī“Ī²(Ī±2 ā’ Ī² 2 )ā’1/2

var(X) = Ī“Ī±2 (Ī±2 ā’ Ī² 2 )ā’3/2

and the skewness and kurtosis:

skew = 3Ī²Ī±ā’1 Ī“ ā’1/2 (Ī±2 ā’ Ī² 2 )ā’1/4

kurtosis = 3Ī“ ā’1 Ī±ā’2 (Ī±2 + 4Ī² 2 )(Ī±2 ā’ Ī² 2 )ā’1/2 .

One of the particularly attractive features of this family of distributions,
shared by the normal and the stable family of distributions, is that it is closed
under convolutions. This is apparent from the moment generating function
(3.23) since
M N (s)

gives a moment generating function of the same form but with Āµ replaced by
ĀµN and Ī“ by Ī“N. In Figure 3.19 we plot the probability density function of a
member of this family of distributions.
GENERATING RANDOM NUMBERS FROM NON-UNIFORM CONTINUOUS DISTRIBUTIONS163

Figure 3.19: Normal Inverse Gaussian probability density function with Ī± =
Ī“ = 1, Ī² = 1 , Āµ = 0
2
164 CHAPTER 3. BASIC MONTE CARLO METHODS

Note the similarity to the normal density but with a modest amount of
skewness and increased weight in the tails. We can generate random variables
from this distribution as follows:

Sample T from an inverse Gaussian distribution (3.21)

Return X = Āµ + Ī²T + N (0, T )
where N (0, T ) is a normal random variable with mean 0 and variance T.

We sample from the inverse Gaussian by using a property of the distribution
that if T has density of the form (3.21) then

(T ā’ Īø)2
(3.24)
c2 T

has a chi-squared distribution with one degree of freedom (easily generated
as the square of a standard normal random variable). The algorithm is (see
Michael, Shucany, Hass (1976));

1. For
1
c= p , and Īø = Ī“c,
Ī±2 ā’ Ī² 2
generate G1 from the Gamma( 1 , Ī“ ) distribution. Deļ¬ne
c
2
r
2
Y1 = 1 + G1 (1 ā’ 1 + )}.
G1

1
2. Generate U2 ā¼ U [0, 1]. If U2 Ā· then output T = ĪøY1
1+Y1

3. Otherwise output T = ĪøY1ā’1 .

The two values ĪøY1 , and ĪøY1ā’1 are the two roots of the equation obtained by
setting (3.24) equal to a chi-squared variate with one degree of freedom and the
1
relative values of the probability density function at these two roots are 1+Y1
1
and 1 ā’ 1+Y1 .

Finally to generate from the normal inverse Gaussian distribution (3.22) we
generate an inverse gamma random variable above and then set X = Āµ + Ī²T +
GENERATING RANDOM NUMBERS FROM NON-UNIFORM CONTINUOUS DISTRIBUTIONS165

Figure 3.20: The Normal and the Normal inverse Gaussian ļ¬t to the S&P500
Returns

N (0, T ). Prause (1999) provides a statistical evidence that the Normal Inverse
Gaussian provides a better ļ¬t than does the normal itself. For example we ļ¬t the
normal inverse gamma distribution to the S&P500 index returns over the period
Jan 1, 1997-Sept 27, 2002. There were a total of 1442 values over this period.
Figure 3.20 shows a histogram of the daily returns together with the normal
and the NIG ļ¬t to the data. The mean return over this period is 8 Ć— 10ā’5 and
the standard deviation of returns 0.013. If we ļ¬t the normal inverse Gaussian
distribution to these returns we obtain parameter estimates

Ī± = 95.23, Ī² = ā’4.72, Ī“ = 0.016, Āµ = 0.0009

and the Q-Q plots in Figure 3.21 . Both
166 CHAPTER 3. BASIC MONTE CARLO METHODS

Figure 3.21: QQ plots showing the Normal Inverse Gaussian and the Normal ļ¬t
to S&P 500 data, 1997-2002

indicate that the normal approximation fails to properly ļ¬t the tails of the
distribution but that the NIG distribution is a much better ļ¬t. This is similar to
the conclusion in Prause using observations on the Dow Jones Industrial Index.

Generating Random Numbers from Discrete Dis-
tributions

Many of the methods described above such as inversion and acceptance-rejection
for generating continuous distributions work as well for discrete random vari-
ables. Suppose for example X is a discrete distribution taking values on the
integers with probability function P [X = x] = f (x), for x = 0, 1, 2, ... Suppose
we can ļ¬nd a continuous random variable Y which has exactly the same value
of its cumulative distribution function at these integers so that FY (j) = FX (j)
for all j = 1, 2, .... Then we may generate the continuous random variable Y,
say by inversion or acceptance-rejection and then set X = bY c the integer part
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS167

of Y . Clearly X takes integer values and since P [X Ā· j] = P [Y Ā· j] = FX (j)
for all j = 0, 1, ..., then X has the desired distribution. The continuous ex-
ponential distribution and the geometric distribution are linked in this way. If
X has a geometric(p) distribution and Y has the exponential distribution with
parameter Ī» = ā’ ln(1 ā’ p), then X has the same distribution as dY e or bY c + 1.

Using the inverse transform method for generating discrete random vari-
ables is usually feasible but for random variables with a wide range of values
of reasonably high probability, it often requires some setup costs to achieve
reasonable eļ¬ciency. For example if X has cumulative distribution function
F (x), x = 0, 1, ...inversion requires that we output an integer X = F ā’1 (U ),an
integer X satisfying F (X ā’ 1) < U Ā· F (X). The most obvious technique for
ļ¬nding such a value of X is to search sequentially through the potential values
x = 0, 1, 2, .... Figure 3.22 is the search tree for inversion for the distribution on
the integers 0, . . . 4 given by
0 1 2 3 4
x
0.11 0.30 0.25 0.21 0.13
f (x)
We generate an integer by repeatedly comparing a uniform [0,1] variate
U with the value at each node, taking the right branch if it is greater than
this threshold value, the left if it is smaller. If X takes positive integer values
{1, 2, ..., N },the number of values searched will average to E(X) which for many
discrete distributions can be unacceptably large.

An easy alternative is to begin the search at a value m which is near the
median (or mode or mean) of the distribution. For example we choose m = 2
and search to the left or right depending on the value of U in Figure 3.23.

If we assume for example that we root the tree at m then this results in
searching roughly an average of E[|X ā’ m + 1|] before obtaining the generated
variable. This is often substantially smaller than E(X) especially when E(X)
is large but still unacceptably large when the distribution has large variance.
168 CHAPTER 3. BASIC MONTE CARLO METHODS

.11

0
.41

1
.66

2
.87

3 4

distribution: .11 .30 .25 .21 .13

Figure 3.22: Sequential Search tree for Inverse Transform with root at x = 0

.66

.87
.41

4
2 3
.11

0 1

Figure 3.23: Search tree rooted near the median
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS169

distribution .11 .30 .25 .21 .13

.45

.55
.24

2
.11 3
1

4
0

Figure 3.24: Optimal Binary Search Tree

An optimal binary search tree for this distribution is graphed in Figure
3.24. This tree has been constructed from the bottom up as follows. We begin
by joining the two smallest probabilities f (4) and f (0) to form a new node
with weight f (0) + f (4) = 0.24. Since we take the left path (towards X =
0 rather than towards X = 4) if U is smaller than the value .11 labelling
the node at the intersection of these two branches. We now regard this pair
of values as a unit and continue to work up the tree from the leaves to the
root. The next smallest pair of probabilities are {0, 1} and {3} which have
probabilities 0.24 and 0.21 respectively so these are the next to be joined hence
working from the leaves to the root of the tree. This optimal binary search
tree provides the minimum expected number of comparisons and is equivalent
to sorting the values in order of largest to smallest probability, in this case
1, 2, 3, 4, 0 , relabelling them or coding them {0, 1, 2, 3, 4} and then applying the
inverse transform method starting at 0.
The leaves of the tree are the individual probabilities f (j) and the internal
170 CHAPTER 3. BASIC MONTE CARLO METHODS

nodes are sums of the weights or probabilities of the āchildrenā, the values f (j)
for j on paths below this node. Let Di represents the depth of the i0 th leaf so for
example the depth of leaf 0 in Figure 3.24 is D0 = 3. Then the average number
of comparisons to generate a single random variable Xstarting at the root is
P
i f (i)Di . The procedure for constructing the last tree provides an optimal

algorithm in the sense that this quantity is minimized. It is possible to show that
an optimal binary search tree will reduce the average number of comparisons
from E(X) for ordinary inversion to less than 1 + 4 [log2 (1 + E(X))].
Another general method for producing variates from a discrete distribution
was suggested by Walker (1974, 1977) and is called the alias method. This is
based on the fact that every discrete distribution is a uniform mixture of two-
point distributions. Apart from the time required to set up an initial table of
aliases and aliasing probabilities, the time required to generate values from a dis-
crete distribution with K supporting points is bounded in K, whereas methods
such as inverse transform have computational time which increase proportion-
ally with E(X).
Consider a discrete distribution of the form with probability function f (j)
on K integers j = 1, 2, ...K. We seek a table of values of A(i) and associated
āaliasā probabilities q(i) so that the desired discrete random variable can be
generated in two steps, ļ¬rst generate one of the integers {1, 2, ..., K} at random
and uniformly, then if we generated the value I, say, replace it by an āaliasā value
A(I) with alias probability q(I). These values A(I) and q(I) are determined
below. The algorithm is:

GENERATE I UNIFORM ON {1, ...K}.

WITH PROBABILITY q(I), OUTPUT X = I, OTHERWISE, X = A(I).

An algorithm for producing these values of (A(i), q(i)), i = 1, ..., K} is sug-
gested by Walker(1977) and proceeds by reducing the number of non-zero prob-
abilities one at a time.
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS171

1. Put q(i) = Kf (i) for all i = 1, ..., K.

2. LET m be the index so that q(m) = min{q(i); q(i) > 0} and let q(M ) =
max{q(i); q(i) > 0}.

3. SET A(m) = M and ļ¬x q(m) ( it is no longer is subject to change).

4. Replace q(M ) by q(M ) ā’ (1 ā’ q(m))

5. Replace (q(1), ...q(K))by (q(1), ..., q(m ā’ 1), q(m + 1), .q(M ) (so the com-
ponent with index m is removed).

6. Return to 2 unless all remaining qi = 1 or the vector of qi ā™s is empty.

Note that on each iteration of the steps above, we ļ¬x one of components
q(m) and remove it from the vector and adjust one other, namely q(M ). Since
we always ļ¬x the smallest q(m) and since the average q(i) is one, we always
obtain a probability, i.e. ļ¬x a value 0 < q(m) Ā· 1. Figure 3.25 shows the way
in which this algorithm proceeds for the distribution
1 2 3 4
x=
.1 .2 .3 .4
f (x) =
We begin with q(i) = 4 Ć— f (i) = .4, .8, 1.2, 1.6 for i = 1, 2, 3, 4. Then since
m = 1 and M = 4 these are the ļ¬rst to be adjusted. We assign A(1) = 4 and
q(1) = 0.4. Now since we have reassigned mass 1 ā’ q(1) to M = 4 we replace
q(4) by 1.6 ā’ (1 ā’ 0.4) = 1. We now ļ¬x and remove q(1) and continue with
q(i) = .8, 1.2, 1.0 for i = 2, 3, 4. The next step results in ļ¬xing q(2) = 0.8,
A(2) = 3 and changing q(3) to q(3) ā’ (1 ā’ q(2)) = 1. After this iteration,
the remaining q(3), q(4) are both equal to 1, so according to step 6 we may
terminate the algorithm. Notice that we terminated without assigning a value
to A(3) and A(4). This assignment is unnecessary since the probability the alias
A(i) is used is (1 ā’ q(i)) which is zero in these two cases. The algorithm
therefore results in aliases A(i) = 4, 3, i = 1, 2 and q(i) = .4, .8, 1, 1, respectively
for i = 1, 2, 3, 4. Geometrically, this method iteratively adjusts a probability
histogram to form a rectangle with base K as in Figure 3.25.
172 CHAPTER 3. BASIC MONTE CARLO METHODS

distribution aliased: .1 .2 .3 .4

1

A(2)
=3

A(1)=
4

1.6

1.6
1.2
.8
.4

4 3
1 2 3 1 2 4

Figure 3.25: The alias method for generating from the distribution 0.1 0.2 0.3
0.4

Suppose I now wish to generate random variables from this discrete distrib-
ution. We simply generate a random variable uniform on the set {1, 2, 3, 4} and
if 1 is selected, we replace it by A(1) = 4 with probability 1 ā’ q(1) = 0.6. If 2
is selected it is replaced by A(2) = 3 with probability 1 ā’ q(2) = 0.2.

Acceptance-Rejection for Discrete Random Variables

The acceptance-rejection algorithm can be used both for generating discrete and
continuous random variables and the geometric interpretation in both cases is
essentially the same. Suppose for example we wish to generate a discrete random
variable X having probability function f (x) using as a dominating function
a multiple of g(x) the probability density function of a continuous random
variable. Take for example the probability function
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS173

0.9

0.8

0.7

0.6

Reject
0.5

0.4

0.3

0.2
Accept
0.1

0
1 2 3 4
X

Figure 3.26: Acceptance-Rejection with for Discrete Distribution with continu-
ous dominating function.

1 2 3 4
x=
f (x) = .1 .3 .4 .2
using the dominating function 2g(x) = 0.1 + 0.2(x ā’ 0.5) for 0.5 < x < 4.5. It
is easy to generate a continuous random variable from the probability density
function g(x) by inverse transform. Suppose we generate the value X. Then if
this value is under the probability histogram graphed in Figure 3.26 we accept
the value (after rounding it to the nearest integer to conform the discreteness
of the output distribution) and otherwise we reject and repeat.
We may also dominate a discrete distribution with another discrete distrib-
ution in which case the algorithm proceeds as in the continuous case but with
the probability density functions replaced by probability functions.

The Poisson Distribution.

Consider the probability function for a Poisson distribution with parameter Ī»

Ī»x eā’Ī»
(3.25)
f (x) = , x = 0, 1, ...
x!

The simplest generator is to use the Poisson process. Recall that a Poisson
process with rate 1 on the real line can be described in two equivalent ways:
174 CHAPTER 3. BASIC MONTE CARLO METHODS

1. Points are distributed on the line in such a way that the spacings be-
tween consecutive points are independent exponential(Ī») random vari-
ables. Then the resulting process is a Poisson process with rate Ī».

2. The number of points in an interval of length h has a Poisson (Ī»h) distri-
bution. Moreover the numbers of points in non-overlapping intervals are
independent random variables.

The simplest generator stems from this equivalence. Suppose we use the
ļ¬rst speciļ¬cation to construct a Poisson process with rate parameter 1 and
then examine X = the number of points occurring in the interval [0, Ī»]. This
is the number of partial sums of exponential(1) random variables that are less
than or equal to Ī»
n+1
X
X = inf{n; (ā’lnUi ) > Ī»}
i=1

or equivalently
n+1
Y
Ui < eā’Ī» } (3.26)
X = inf{n;
i=1

This generator requires CPU time which grows linearly with Ī» since the
number of exponential random variables generated and summed grows linearly
with Ī» and so an alternative for large Ī» is required. Various possibilities of
acceptance-rejection algorithms have been suggested including dominating the
Poisson probability function with multiples of the logistic probability density
function (Atkinson (1979)), the normal density with exponential right tail (cf.
Devroye, lemma 3.8, page 509). A simple all-purpose dominating function is the
so-called table-mountain function (cf. Stadlober (1989)), essentially a function
with a ļ¬‚at top and tails that decrease as 1/x2 . Another simple alternative for
generating Poisson variates that is less eļ¬cient but simpler to implement is to
use the Lorentzian, or truncated Cauchy distribution with probability density
function
c0
(3.27)
g(x|a, b) = ,x > 0
b2 + (x ā’ a)2
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS175

where c0 is the normalizing constant. A random variable is generated from this
distribution using the inverse transform method; X = a + b tan(ĻU ),, where
U ā¼ U [0, 1]. Provided that we match the modes of the distribution a = Ī» and
ā
put b = 2Ī», this function may be used to dominate the Poisson distribution
and provide a simple rejection generator. The Matlab Poisson random number
generator is poissrnd (Ī», m, n) which generates an m Ć— n matrix of Poisson(Ī»)
variables. This uses the simple generator (3.26) and is not computationally
eļ¬cient for large values of Ī».In R the command rpois(n,Ī») generates a vector
of n Poisson variates.

The Binomial Distribution

For the Binomial distribution, we may use any one of the following alternatives:

Pn
I(Ui < p), Ui ā¼ independent uniform[0, 1]
(1) X = i=1

Px+1
Gi > n}, where Gi ā¼independent Geometric(p)
(2) X = inf{x; i=1

Px+1 Ei
> ā’log(1ā’p)}, where Ei ā¼independent Exponential(1).
(3) X = inf{x; i=1 nā’i+1

Method (1) obtains from the deļ¬nition of the sum of independent Bernoulli
random variables since each of the random variables I(Ui < p) are independent,
have values 0 and 1 with probabilities 1 ā’ p and p respectively. The event
(Ui < p) having probability p is typically referred to as a āsuccessā. Obviously
this method will be slow if n is large. For method (2), recall that the number of
trials necessary to obtain the ļ¬rst success, G1 , say, has a geometric distribution.
Similarly, G2 represents the number of additional trials to obtain the second
success. So if X = j, the number of trials required to obtain j + 1 successes
was greater than n and to obtain j successes, less than or equal to n. In other
words there were exactly j successes in the ļ¬rst n trials. When n is large but np
fairly small, method (2) is more eļ¬cient since it is proportional to the number
os successes rather than the total number of trials. Of course for large n and
176 CHAPTER 3. BASIC MONTE CARLO METHODS

np suļ¬ciently small (e.g. <1), we can also replace the Binomial distribution
by its Poisson (Ī» = np) approximation. Method (3) is clearly more eļ¬cient if
ā’log(1 ā’ p) is not too large so that p is not too close to 1, because in this case
we need to add fewer exponential random variables.

For large mean np and small n(1 ā’ p) we can simply reverse the role of
successes and failures and use method (2) or (3) above. But if both np and
n(1 ā’ p) are large, a rejection method is required. Again we may use rejection
p
beginning with a Lorentzian distribution, choosing a = np, and b = 2np(1 ā’ p)
in the case p < 1/2. When p > 1/2, we simply reverse the roles of āfailuresā
and āsuccessesā. Alternatively, a dominating table-mountain function may be
used (Stadlober (1989)). The binomial generator in Matlab is the function
binornd(n,p,j,k) which generates an n Ć— k matrix of binomial(n, p) random
variables. This uses the simplest form (1) of the binomial generator and is not
computationally eļ¬cient for large n. In R, rbinom(m,n,p) will generate a vector
of length m of Binomial(n, p) variates.

Random Samples Associated with Markov Chains

Consider a ļ¬nite state Markov Chain, a sequence of (discrete) random variables
X1 , X2 , . . .each of which takes integer values 1, 2, . . . N (called states). The
number of states of a Markov chain may be large or even inļ¬nite and it is not
always convenient to label them with the positive integers and so it is common
to deļ¬ne the state space as the set of all possible states of a Markov chain, but
we will give some examples of this later. For the present we restrict attention to
the case of a ļ¬nite state space. The transition probability matrix is a matrix P
describing the conditional probability of moving between possible states of the
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 177

chain, so that

P [Xn+1 = j|Xn = i] = Pij , i = 1, . . . N, j = 1, . . . N.

P
where Pij ā„ 0 for all i, j and for all i. A limiting distribution
Pij = 1
j

of a Markov chain is a vector (Ļ say) of long run probabilities of the individual
states with the property that

Ļi = limtā’ā P [Xt = i].

A stationary distribution of a Markov chain is the column vector (Ļ say) of
probabilities of the individual states such that

Ļ0 P = Ļ0 . (3.28)

Ļ 0 P = Ļ0 . For a Markov chain, every limiting distribution is in fact a station-
ary distribution. For the basic theory of Markov Chains, see the Appendix.
Roughly, a Markov chain which eventually āforgetsā the states that were occu-
pied in the distant path, in other words for which the probability of the current
states does not vary much as we condition on diļ¬erent states in the distant
past, is called ergodic. A Markov chain which simply cycles through three
states 1 ā’ 2 ā’ 3 ā’ 1 ā’ ... is an example of a periodic chain, and is not
ergodic.
It is often the case that we wish to simulate from a ļ¬nite ergodic Markov
chain when it has reached equilibrium or stationarity, which is equivalent to
sampling from the distribution of Xn assuming that the distribution of X0
is given by the stationary distribution Ļ. In a few cases, we can obtain this
stationary distribution directly from (3.28) but when N is large this system of
equations is usually not feasible to solve and we need to ļ¬nd another way to
sample from the probability vector Ļ. Of course we can always begin the Markov
chain in some arbitrary initial state and run it waiting for Hele to freeze over (it
does happen since Helle is in Devon) until we are quite sure that the chain has
178 CHAPTER 3. BASIC MONTE CARLO METHODS

essentially reachedequilibrium, and then use a subsequent portion of this chain,
discarding this initial period, sometimes referred to as the āinitial transientā.
Clearly this is often not a very eļ¬cient method, particularly in cases in
which the chain mixes or forgets its past very slowly for in this case the required
initial transient is long. On the other hand if we shortened it, we run the risk of
introducing bias into our simulations because the distribution generated is too
far from the equilibrium distribution Ļ. There are a number of solutions to this
problem proposed in a burgeoning literature. Here we limit ourselves to a few
of the simpler methods.

Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm is a method for generating random variables
from a distribution Ļ that applies even in the case of an inļ¬nite number of
states or a continuous distribution Ļ. It is assumed that Ļ is known up to some
multiplicative constant. Roughly, the method consists of using a convenient
āproposalā Markov chain with transition matrix Q to generate transitions, but
then only āacceptā the move to these new states with probability that depends
on the distribution Ļ. The idea resembles that behind importance sampling.
The basic result on which the Metropolis-Hastings algorithm is pinned is the
following theorem.

Theorem 31 Suppose Qij is the transition matrix of a Markov chain. Assume
PN
that g is a vector of non-negative values such that i=1 gi = G and
gj
| | Ā· K < ā for all i, j
Qij
for some ļ¬nite value K. Deļ¬ne
gj Qji
Ļij = min(1, )
gi Qij
Then the Markov Chain with transition probability matrix

Pij = Qij Ļij , for i 6= j (3.29)
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 179

gi
has stationary distribution Ļi = G.

Proof. The proof consists of showing that the so-called ādetailed balance
gi
conditionā is satisļ¬ed, i.e. with Ļi = that
G,

Ļi Pij = Ļj Pji , for all i, j. (3.30)

This condition implies that when the chain is operating in equilibrium,

P [Xn = i, Xn+1 = j] = P [Xn = j, Xn+1 = i]

reļ¬‚ecting a cavalier attitude to the direction in which time ļ¬‚ows or reversibility
of the chain. Of course (3.30) is true automatically if i = j and for i 6= j,

gi gj Qji
Ļi Pij = Qij min(1, )
G gi Qij
1
= min(gi Qij , gj Qji )
G
= Ļj Pji

1
by the symmetry of the function min(gi Qij , gj Qji ). Now the detailed balance
G

condition (3.30) implies that Ļ is a stationary distribution for this Markov chain
since
N N
X X
Ļi Pij = Ļj Pji
i=1 i=1
N
X
= Ļj Pji
i=1

= Ļj for each j = 1, ..., N.

Provided that we are able to generate transitions for the Markov Chain with
transition matrix Q, it is easy to generate a chain with transition matrix P in
(3.29). If we are currently in state i, generate the next state with probability
Qij . If j = i then we stay in state i. If j 6= i, then we āacceptā the move to state
j with probability Ļij , otherwise we stay in state i. Notice that the Markov
180 CHAPTER 3. BASIC MONTE CARLO METHODS

Chain with transition matrix P tends to favour moves which increase the value
of Ļ. For example if the proposal chain is as likely to jump from i to j as it
is to jump back so that Qij = Qji , then if Ļj > Ļi the move to j is always
Ļj
accepted whereas if Ļj < Ļi the move is only accepted with probability The
Ļi .

assumption Qij = Qji is a common and natural one, since in applications of
the Metropolis-Hastings algorithm, it is common to choose j āat randomā (i.e.
uniformly distributed) from a suitable neighborhood of i.
The above proof only provides that Ļ is a stationary distribution of the
Markov Chain associated with P, not that it is necessarily the limiting distrib-
ution of this Markov chain. For this to follow we need to know that the chain
is ergodic. Various conditions for ergodicity are given in the literature. See for
example Robert and Casella (1999, Chapter 6) for more detail.

Gibbs Sampling

There is one simple special case of the Metropolis-Hastings algorithm that is
particularly simple, common and compelling. To keep the discussion simple,
suppose the possible states of our Markov Chain are points in two-dimensional
space (x, y). We may assume both components are discrete or continuous. Sup-
pose we wish to generate observations from a stationary distribution which is
proportional to g(x, y) so
g(x, y)
Ļ(x, y) = P P (3.31)
y g(x, y)
x

deļ¬ned on this space but that the form of the distribution is such that directly
generating from this distribution is diļ¬cult, perhaps because it is diļ¬cult to
obtain the denominator of (3.31). However there are many circumstances where
it is much easier to obtain the value of the conditional distributions
Ļ(x, y)
Ļ(x|y) = P and
z Ļ(z, y)
Ļ(x, y)
Ļ(y|x) = P
z Ļ(x, z)
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 181

Now consider the following algorithm: begin with an arbitrary value of y0
and generated x1 from the distribution Ļ(x|y0 ) followed by generating y1 from
the distribution Ļ(y|x1 ). It is hard to imagine a universe in which iteratively
generating values xn+1 from the distribution Ļ(x|yn ) and then yn+1 from the
distribution Ļ(y|xn+1 ) does not, at least asymptotically as n ā’ ā, eventually
lead to a draw from the joint distribution Ļ(x, y). Indeed that is the case since
the transition probabilities for this chain are given by

P (xn+1 , yn+1 |xn , yn ) = Ļ(xn+1 |yn )Ļ(yn+1 |xn+1 )

and it is easy to show directly from these transition probabilities that

X
P (x1 , y1 |x, y)Ļ(x, y)
(x,y)
X X
= Ļ(y1 |x1 ) Ļ(x1 |y) Ļ(x, y)
y x
X
= Ļ(y1 |x1 ) Ļ(x1 , y)
y

= Ļ(x1 , y1 ).

Of course the real power of Gibbs Sampling is achieved in problems that are
not two-dimensional such as the example above, but have dimension suļ¬ciently
high that calculating the sums or integrals in the denominator of expressions
like (3.31) is not computationally feasible.

Coupling From the Past: Sampling from the stationary dis-
tribution of a Markov Chain

All of the above methods assume that we generate from the stationary distri-
bution of a Markov chain by the āuntil Hele freezes overā method, i.e. wait
until run the chain from an arbitrary starting value and then delete the initial
transient. An alternative elegant method that is feasible at least for some ļ¬nite
182 CHAPTER 3. BASIC MONTE CARLO METHODS

state Markov chains is the method of ācoupling from the pastā due to Propp
and Wilson (1996).
We assume that we are able to generate transitions in the Markov Chain. In
other words if the chain is presently in state i at time n we are able to generate
a random variable Xn+1 from the distribution proportional to Pij , j = 1, ...K.
Suppose F (x|i) is the cumulative distribution function P (Xn+1 Ā· x|Xn = i)
and let us denote its inverse by F ā’1 (y|i). So if we wish to generate a random
variable Xn+1 conditional on Xn , we can use the inverse transform Xn+1 =
F ā’1 (Un+1 |Xn ) applied to the Uniform[0,1] random variable Un+1 . Notice that
a starting value say Xā’100 together with the sequence of uniform[0,1] variables
(Uā’99 , ..., U0 ) determines the chain completely over the period ā’100 Ā· t Ā· 0.
If we wish to generated the value of Xt given Xs , s < t, then we can work this
expression backwards

Xt = F ā’1 (Utā’1 |Xtā’1 )

= F ā’1 (Utā’1 |F ā’1 (Utā’2 |Xtā’2 ))

= F ā’1 (Utā’1 |F ā’1 (Utā’2 |...F ā’1 (Utā’1 |F ā’1 (Us |i))))
t
= Fs (Xs ), say.

Now imagine an inļ¬nite sequence {Ut , t = ..., ā’3, ā’2, ā’1} of independent uni-
form[0,1] random variables that was used to generate the state X0 of a chain
at time 0. Let us imagine for the moment that there is a value of M such that
0
Fā’M (i) is a constant function of i. This means that for this particular draw
of uniform random numbers, whatever the state i of the system at time ā’M,
0
the same state X0 = Fā’M (i) is generated to time 0. All chains, possibly with
diļ¬erent behaviour prior to time ā’M are ācoupledā at time ā’M and identical
from then on. In this case we say that coalescence has occurred in the interval
[ā’M, 0]. No matter where we start the chain at time ā’M it ends up in the
same state at time 0, so it is quite unnecessary to simulate the chain over the
whole inļ¬nite time interval ā’ā < t Ā· 0. No matter what state is occupied
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 183

at time t = ā’M, the chain ends up in the same state at time t = 0. When
coalescence has occurred, we can safely consider the common value of the chain
at time 0 to be generated from the stationary distribution since it is exactly
the same value as if we had run the chain from t = ā’ā.

There is sometimes an easy way to check whether coalescence has occurred
in an interval, if the state space of the Markov chain is suitably ordered. For
example suppose the states are numbered 1, 2, ..., N. Then it is sometimes
possible to relabel the states so that the conditional distribution functions F (x|i)
are stochastically ordered, or equivalently that F ā’1 (U |i) is monotonic (say
monotonically increasing) in i for each value of U. This is the case for example
Pj
provided that the partial sums l=1 Pil are increasing functions of i for each
0
j = 1, 2, ..., N. If follows that the functions Fā’M (i) are all monotonic functions
of i and so

0 0 0
Fā’M (1) Ā· Fā’M (2) Ā· ...Fā’M (N ).

0 0 0
Therefore, if Fā’M (1) = Fā’M (N ), then Fā’M (i) must be a constant function.
Notice also that if there is any time in an interval [s, t] at which coalescence
t
occurs so that Fs (i) is a constant function of i, then for any interval [S, T ]
T
containing it [S, T ] ā [s, t], FS (i) is also a constant function of i.

It is easy to prove that coalescence occurs in the interval [ā’M, 0] for suf-
ļ¬ciently large M. For an ergodic ļ¬nite Markov chain, there is some step size
Ļ„ such that every transition has positive probability P [Xt+Ļ„ = j|Xt = i] > Ā²
for all i, j. Consider two independent chains, one beginning in state i and the
other in state i0 at time t = 0. Then the probability that they occupy the same
state j at time t = Ļ„ is at least Ā²2 . It is easy to see that if we use inverse
transform to generate the transitions and if they are driven by common random
numbers then this can only increase the probability of being in the same state,
so the probability these two chains are coupled at time Ļ„ is at least Ā²2 . Similarly
for N possible states, the probability of coalescence in an interval of length Ļ„
184 CHAPTER 3. BASIC MONTE CARLO METHODS

is at least ĪµN > 0. Since there are inļ¬nitely many intervals disjoint of length
Ļ„ in [ā’ā, 0] and the events that there is a coalescence in each interval are
independent, the probability that coalescence occurs somewhere in [ā’ā, 0] is
1.
We now detail the Propp Wilson algorithm

1. Set M = 1, XU = N, XL = 1

2. Generate Uā’M ....Uā’M/2+1 all independent U nif orm[0, 1].

3. For t = ā’M to ā’1 repeat

(a) obtain XL = F ā’1 (Utā’1 |XL ) and XU = F ā’1 (Utā’1 |XU ).

(b) If XL = XU stop and output X(0) = XL

4. Otherwise, set M = 2M and go to step 2.

This algorithm tests for coalescence repeatedly by starting on the intervals

[ā’1, 0], [ā’2, ā’1], [ā’4, ā’2], [ā’8, ā’4].

We are assured that with probability one, the process will terminate with co-
alescence after a ļ¬nite number of steps. Moreover, in this algorithm that the
random variable Ut once generated is NOT generated again on a subsequent
pass when M is doubled. The generated Ut is reused at each pass until coales-
cence occurs. If Ut were regenerated on subseuqent passes, this would lead to
bias in the algorithm.
It may well be that this algorithm needs to run for a very long time before
achieving coalescence and an impatient observer who interrupts the algorithm
prior to coalescence and starts over will bias the results. Varous modiļ¬cations
have been made to speed up the algorithm (e.g. Fill, 1998).
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 185

Sampling from the Stationary Distribution of a Diļ¬usion
Process

A basic Ito process of the form

dXt = a(Xt )dt + Ļ(Xt )dWt

is perhaps the simplest extension of a Markov chain to continuous time, contin-
uous state-space. It is well-known that under fairly simple conditions, there is
a unique (strong) solution to this equation and that the limiting distribution of
XT as T ā’ ā has stationary distribution with probability density function
Z x
1 a(z)
f (x) = c 2 exp{2 dz}
Ļ 2 (z)
Ļ (x) 0

where the constant c is chosen so that the integral of the density is 1. To be
able to do this we need to assume that
Z Z
ā x
1 a(z)
dz}dx < ā. (3.32)
exp{2
Ļ 2 (x) Ļ 2 (z)
ā’ā 0

In order to generate from this stationary distribution, we can now start the
process at some arbitrary value X0 and run it for a very long time T , hoping
that this is suļ¬ciently long that the process is essentially in its stationary
state, or try to generate X0 more directly from (3.32) in which case the process
is beginning (and subsequently running) with its stationary distribution.

1/2
dXt = k(b ā’ Xt )dt + ĻXt (3.33)
dWt .

In this case

a(x) = k(b ā’ x), for x > 0,

Ļ 2 (x) = Ļ2 x, for x > 0.
186 CHAPTER 3. BASIC MONTE CARLO METHODS

Notice that
Z x
k(b ā’ z)
1 1 2kb k
dz} = 2 xā’1 exp{ 2 ln(x/Īµ) ā’ 2 (x ā’ Īµ)}
exp{2
Ļ2x 2z
Ļ Ļ Ļ Ļ
Īµ

is proportional to
2
x2kb/Ļ ā’1
exp{ā’kx/Ļ 2 }

and the integral of this function, a Gamma function, will fail to converge unless
2kb/Ļ 2 ā’ 1 > ā’1 or 2kb > Ļ 2 . Under this condition the stationary distribution
2
of the CIR process is Gamma(2kb/Ļ2 , Ļ ). If this condition fails and 2kb < Ļ 2 ,
k

then the process Xt is absorbed at 0. If we wished to simulate a CIR process in
equilibrium, we should generate starting values of X0 from the Gamma distrib-
ution. More generally for a CEV process satisfying

Ī³/2
dXt = k(b ā’ Xt )dt + ĻXt (3.34)
dWt

a similar calculation shows that the stationary density is proportional to

2kb 1 k
xā’Ī³ exp{ā’ ā’ 2 xĪ³ }, for Ī³ > 1.
Ļ 2 xĪ³ā’1 (Ī³ ā’ 1) Ļ Ī³

Simulating Stochastic Partial Diļ¬erential Equa-
tions.

Consider a derivative product whose underlying asset has price Xt which follows
some model. Suppose the derivative pays an amount V0 (XT ) on the maturity
date T. Suppose that the value of the derivative depends only on the current time
t and the current value of the asset S, then its current value is the discounted
future payoļ¬, an expectation of the form
Z T
(3.35)
V (S, t) = E[V0 (XT )exp{ā’ r(Xv , v)dv}|Xt = S]
t

where r(Xt , t) is the current spot interest rate at time t. In most cases, this ex-
pectation is impossible to evaluate analytically and so we need to resort to
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.187

numerical methods. If the spot interest rate is function of both arguments
(Xv , v) and not just a function of time, then this integral is over the whole
joint distribution of the process Xv , 0 < v < T and simple one-dimensional
methods of numerical integration do not suļ¬ce. In such cases, we will usu-
ally resort to a Monte-Carlo method. The simplest version requires simulating
a number of sample paths for the process Xv starting at Xt = S, evaluating
RT
V0 (XT )exp{ā’ t r(Xv , v)dv} and averaging the results over all simulations. We
begin by discussing the simulation of the process Xv required for integrations
such as this.
Many of the stochastic models in ļ¬nance reduce to simple diļ¬usion equation
(which may have more than one factor or dimension). Most of the models
in ļ¬nance are Markovian in the sense that at any point t in time, the future
evolution of the process depends only on the current state Xt and not on the
past behaviour of the process Xs , s < t. Consequently we restrict to a āMarkov
diļ¬usion modelā of the form

(3.36)
dXt = a(Xt , t)dt + Ļ(Xt , t)dWt

with some initial value X0 for Xt at t = 0. Here Wt is a driving standard Brown-
ian motion process. Solving deterministic diļ¬erential equations can sometimes
provide a solution to a speciļ¬c problem such as ļ¬nding the arbitrage-free price of
a derivative. In general, for more complex features of the derivative such as the
distribution of return, important for considerations such as the Value at Risk,
we need to obtain a solution {Xt , 0 < t < T }to an equation of the above form
which is a stochastic process. Typically this can only be done by simulation.
One of the simplest methods of simulating such a process is motivated through
a crude interpretation of the above equation in terms of discrete time steps, that
is that a small increment Xt+h ā’ Xt in the process is approximately normally
distributed with mean given by a(Xt , t)hand variance given by Ļ 2 (Xt , t)h. We
generate these increments sequentially, beginning with an assumed value for
188 CHAPTER 3. BASIC MONTE CARLO METHODS

X0 , and then adding to obtain an approximation to the value of the process
at discrete times t = 0, h, 2h, 3h, . . .. Between these discrete points, we can
linearly interpolate the values. Approximating the process by assuming that
the conditional distribution of Xt+h ā’ Xt is N (a(Xt , t)h, Ļ 2 (Xt , t)h) is called
Eulerā™s method by analogy to a simple method by the same name for solving or-
dinary diļ¬erential equations. Given simulations of the process satisfying (3.36)
together with some initial conditions, we might average the returns on a given
derivative for many such simulations, (provided the process is expressed with
respect to the risk-neutral distribution), to arrive at an arbitrage-free return for
the derivative.
In this section we will discuss the numerical solution, or simulation of the
solution to stochastic diļ¬erential equations.
Letting ti = iāx, Equation (3.36) in integral form implies
Z ti+1 Z ti+1
(3.37)
Xti+1 = Xti + a(Xs , s)ds + Ļ(Xs , s)dWs
ti ti

For the following lemma we need to introduce Op or āorder in probabilityā,
notation common in mathematics and probability. A sequence indexed by āt,
say Yāt = Op (āt)k means that when we divide this term by (āt)k and then
let āt ā’ 0, the resulting sequence is bounded in probability or that for each Īµ
there exists K < ā so that
Yāt
| > K] < Īµ
P [|
ātk
whenever |āt| < Īµ. As an example, if W is a Brownian motion, then āWt =
W (t+āt)ā’W (t) has a Normal distribution with mean 0 and standard deviation
ā
āt and is therefore Op (āt)1/2 . Similarly Then we have two very common
and useful approximations to a diļ¬usion given by the following lemma.

Lemma 32 If Xt satisļ¬es a diļ¬usion equation of the form (3.36) then

Xti+1 = Xti + a(Xti , ti )āt + Ļ(Xti , ti )āWt + Op (āt) (Euler approximation)
ā‚
Ļ(Xti , ti ) ā‚x Ļ(Xti , ti )
[(āWt )2 ā’ āt] + Op (āt)3/2 (Milstei
Xti+1 = Xti + a(Xti , ti )āt + Ļ(Xti , ti )āWt +
2
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.189

Proof. Itoā™s lemma can be written in terms of two operators on functions
f for which the derivatives below exist;

df (Xt , t) = Lo f dt + L1 f dWt where

1 2 ā‚2
ā‚ ā‚
0
+ , and
L =a +Ļ 2
ā‚x 2 ā‚x ā‚t
ā‚
L1 = Ļ .
ā‚x

Integrating, this and applying to twice diļ¬erentiable functions a and Ļ and
s > ti ,
Z Z
s s
0
L1 a(Xu , u)dWu
a(Xs , s) = a(Xti , ti ) + L a(Xu , u)du +
t t
Z is Z is
L0 Ļ(Xu , u)du + L1 Ļ(Xu , u)dWu .
Ļ(Xs , s) = Ļ(Xti , ti ) +
ti ti

By substituting in each of the integrands in 3.37 using the above identity and
iterating this process we arrive at the Ito-Taylor expansions (e.g. Kloeden and
Platen, 1992). For example,
Z ti+1 Z ti+1 Z Z
s s
0
L1 a(Xu , u)dWu }ds
{a(Xti , ti ) +
a(Xs , s)ds = L a(Xu , u)du +
ti ti ti ti
Z ti+1 Z Z Z
s ti+1 s
0 1
ā a(Xti , ti )āt + L a(Xti , ti ) duds + L a(Xti , ti ) dWu ds
ti ti ti ti

The ļ¬rst term a(Xti , ti )āt, is an initial approximation to the desired inte-
gral and the rest is a lower order correction that we may regard as an er-
ror term for the moment. For example it is easy to see that the second term
Rt Rs Rt Rs
L0 a(Xti , ti ) tii+1 ti duds is Op (āt)2 because the integral tii+1 ti duds = (āt)2 /2
Rt Rs
and L0 a(Xti , ti ) is bounded in probability. The third term L1 a(Xti , ti ) tii+1 ti dWu ds
Rt Rs Rt
is Op (āt)3/2 since tii+1 ti dWu ds = tii+1 (ti+1 ā’ u)dWu and this is a normal
Rt
random variable with mean 0 and variance tii+1 (ti+1 ā’ u)2 du = (āt)3 /3. We
can write such a normal random variable as 3ā’1/2 (āt)3/2 Z for Z a standard
190 CHAPTER 3. BASIC MONTE CARLO METHODS

normal random variable and so this is obviously Op (āt)3/2 . Thus the simplest
Euler approximation to the distribution of the increment assumes that āX has
conditional mean a(Xti , ti )āt. Similarly
Z ti+1 Z ti+1 Z Z
s s
0
L1 Ļ(Xu , u)dWu }dWs
{Ļ(Xti , ti ) +
Ļ(Xs , s)dWs = L Ļ(Xu , u)du +
ti ti ti ti
Z ti+1 Z Z Z
s ti+1 s
0 1
ā Ļ(Xti , ti )āWt + L Ļ(Xti , ti ) dudWs + L Ļ(Xti , ti ) dWu dWs
ti ti ti ti
ā‚
Ļ(Xti , ti ) ā‚x Ļ(Xti , ti )
[(āWt )2 ā’ āt] + Op (āt)3/2
= Ļ(Xti , ti )āWt +
2
R ti+1 R s
dWu dWs = 1 [(āWt )2 ā’ āt], L0 Ļ(Xu , u) = Ļ(Xti , ti )+ Op (āt)1/2 ,
since 2
ti ti
Rt Rs
ā‚
L1 Ļ(Xu , u) = Ļ(Xu , u) ā‚x Ļ(Xu , u) and tii+1 ti dudWs = Op (āt)3/2 . Putting
these terms together, we arrive at an approximation to the increment of the
form
ā‚
Ļ(Xti , ti ) ā‚x Ļ(Xti , ti )
[(āWt )2 ā’āt]+Op (āt)3/2
āXt = a(Xti , ti )āt+Ļ(Xti , ti )āWt +
2
(3.38)
which allow an explicit representation of the increment in the process X in terms
of the increment of a Brownian motion process āWt ā¼ N (0, āt).
The approximation (3.38) is called the Milstein approximation, a reļ¬nement
of the ļ¬rst, the Euler approximation. It is the second Ito-Taylor approximation
to a diļ¬usion process. Obviously, the increments of the process are quadratic
functions of a normal random variable and are no longer normal. The error
approaches 0 at the rate Op (āt)3/2 in probability only. This does not mean
that the trajectory is approximated to this order but that the diļ¬erence between
the Milstein approximation to a diļ¬usion and the diļ¬usion itself is bounded in
probability when divided by (āt)3/2 and as we let āt ā’ 0. Higher order Taylor
approximations are also possible, although they grow excessively complicated
very quickly. See the book by Kloeden and Platten(1992) for details.
There remains the question of how much diļ¬erence it makes which of these
approximations we employ for a particular diļ¬usion. Certainly there is no dif-
ference at all between the two approximations in the case that the diļ¬usion
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.191

coeļ¬cient Ļ(Xt , t) does not depend at all on Xt . In general, the diļ¬erence is
hard to assess but in particular cases we can at least compare the performance of
the two methods. The approximations turn out to be very close in most simple
cases. For example consider the stock price path in Figure 3.27. The dashed
line corresponds to a Milstein approximation whereas the piecewise continuous
line corresponds to the Euler approximation. In this case the Milstein appears
to be a little better, but if I run a number of simulations and compare the sum
of the squared errors (i.e. squared diļ¬erences between the approximate value
of Xt and the true value of Xt ) we ļ¬nd that the improvement is only about
two percent of the diļ¬erence. The same is true even if I change the value of
āt from 1/12 (i.e. one month) to 1/52 (i.e. one week). Unlike the behaviour
of higher order approximations to deterministic functions, there appears to be
little advantage in using a higher order approximation, at least in the case of
diļ¬usions with smooth drift and diļ¬usion coeļ¬cients.

We can compare using Milstein approximation on the original process and
using Eulerā™s approximation on a transformation of the process in the case that
the diļ¬usion term depends only on the state of the process (not time). In other
words, suppose we have an Ito process of the form

(3.39)
dXt = a(Xt , t)dt + Ļ(Xt )dWt

where Wt is an ordinary Wiener measure. A simple transformation reduces this
to a problem with constant diļ¬usion term. Suppose Ļ(x) > 0 for all x and let
Z x
1
dz, for x ā„ 0
s(x) =
Ļ(z)
0
Z 0
1
s(x) = ā’ dz for x < 0
Ļ(z)
x

where we assume these integrals are well deļ¬ned. Let g be the inverse function of
s. This inverse exists since the function is continuous monotonically increasing.
192 CHAPTER 3. BASIC MONTE CARLO METHODS

Figure 3.27: Comparison of Milstein and Euler approximation to stock with
āt = 1/12 year.
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.193

Suppose we apply Itoā™s lemma to the transformed process Yt = s(Xt ). We obtain
1
dYt = {a(Xt , t)s0 (Xt ) + Ļ 2 (Xt )s00 (Xt )}dt + Ļ(Xt )s0 (Xt )dWt
2
Ļ0 (Xt )
a(Xt , t) 1 2
={ ā’ Ļ (Xt ) 2 }dt + dWt
Ļ(Xt ) 2 Ļ (Xt )
= Āµ(Yt , t)dt + dWt

where
a(g(Yt ), t) 1 0
ā’ Ļ (g(Yt )).
Āµ(Yt , t) =
Ļ(g(Yt )) 2
In other words, Y t satisļ¬es an Ito equation with constant diļ¬usion term.
Suppose we generate an increment in Yt using Eulerā™s method and then solve
for the corresponding increment in Xt .Then using the ļ¬rst two terms in the
Taylor series expansion of g,
1
āXt = g 0 (Yt )āYt + g 00 (Yt )(āYt )2
2
1
= g 0 (Yt )(Āµ(Yti , ti )āt + āWt ) + Ļ 0 (g(Yt ))Ļ(g(Yt ))(āYt )2
2
1 1
= {a(g(Yt ), t) ā’ Ļ(g(Yt ))Ļ 0 (g(Yt ))}āt + Ļ(g(Yt ))āWt + Ļ 0 (g(Yt ))Ļ(g(Yt ))(āYt )2
2 2
since
1
g 0 (Yt ) = = Ļ(g(Yt )) and
s0 (g(Y t ))

g 00 (Yt ) = Ļ0 (g(Yt ))Ļ(g(Yt )).

But since (āYt )2 = (āWt )2 + o(āt) it follows that
1 1
āXt = {a(Xt , t)ā’ Ļ(Xt )Ļ 0 (Xt )}āt+Ļ(Xt )āWt + Ļ 0 (Xt )Ļ(Xt )(āWt )2 +o(āt)
2 2
and so the approximation to this increment is identical, up to the order con-
sidered, to the Milstein approximation. For most processes, it is preferable
to apply a diļ¬usion stabilizing transformation as we have here, prior to dis-
cretizing the process. For the geometric Brownian motion process, for example,
the diļ¬usion-stabilizing transformation is a multiple of the logarithm, and this
transforms to a Brownian motion, for which the Euler approximation gives the
exact distribution.
194 CHAPTER 3. BASIC MONTE CARLO METHODS

Example: Down-and-out-Call.

Consider an asset whose price under the risk-neutral measure Q follows a con-
stant elasticity of variance (CEV) process

Ī³
(3.40)
dSt = rSt dt + ĻSt dWt

for a standard Brownian motion process Wt . A down-and-out call option with
exercise price K provides the usual payment (ST ā’ K)+ of a European call
option on maturity T if the asset never falls below a given out barrier b. The
parameter Ī³ > 0 governs the change in the diļ¬usion term as the asset price
changes. We wish to use simulation to price such an option with current asset
price S0 , time to maturity T , out barrier b < S0 and constant interest rate r
and compare with the Black-Scholes formula as b ā’ 0.
A geometric Brownian motion is most easily simulated by taking logarithms.
For example if St satisļ¬es the risk-neutral speciļ¬cation

(3.41)
dSt = rSt dt + ĻSt dWt

then Yt = log(St ) satisļ¬es

dYt = (r ā’ Ļ2 /2)dt + ĻdWt . (3.42)

This is a Brownian motion and is simulated with a normal random walk. In-
dependent normal increments are generated āYt ā¼ N ((r ā’ Ļ2 /2)āt, Ļ2 āt) and
their partial sums used to simulate the process Yt . The return for those options
that are in the money is the average of the values of (eYT ā’ E)+ over those
paths for which min{Ys ; t < s < T } ā„ ln(b). Similarly the transformation of
the CEV process which provides a constant diļ¬usion term is determined by
Zx
1
s(x) = dz
0 Ļ(z)
ā§
Zx āØ x1ā’Ī³ if Ī³ 6= 1
1ā’Ī³
z ā’Ī³ dz =
= .
ā© ln(x) if Ī³ = 1
0
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.195

Assuming Ī³ 6= 1, the inverse function is

g(y) = cy 1/(1ā’Ī³)

1ā’Ī³
for constant c and the process Yt = (1 ā’ Ī³)ā’1 St satisļ¬es an Ito equation
with constant diļ¬usion coeļ¬cient;
r 1ā’Ī³ 1 Ī³ā’1
dYt = { St ā’ Ī³ĻSt }dt + dWt
Ļ 2
r Ī³Ļ
dYt = { (1 ā’ Ī³)Yt ā’ }dt + dWt . (3.43)
2(1 ā’ Ī³)Yt
Ļ
After simulating the process Yt we invert the relation to obtain St = ((1 ā’
Ī³)Yt )1/(1ā’Ī³) . There is one ļ¬ne point related to simulating the process (3.43)
that we implemented in the code below. The equation (3.40) is a model for
a non-negative asset price St but when we simulate the values Yt from (3.43)
there is nothing to prevent the process from going negative. Generally if Ī³ ā„ 1/2
and if we increment time in suļ¬ciently small steps āt, then it is unlikely that
a negative value of Yt will obtain, but when it does, we assume absorption at 0
(analogous to default or bankruptcy). The following Matlab function was used
to simulate sample paths from the CEV process over the interval [0, T ].
 << ńņš. 3(āńåćī 4)ŃĪÄÅŠĘĄĶČÅ >>