ńņš. 3 |

0

p

p K (Ī± Ī“ 2 + (x ā’ Āµ)2 )

Ī±Ī“ 2 ā’ Ī² 2 + Ī²(x ā’ Āµ)) 1 p

= exp(Ī“ Ī± .

Ļ Ī“ 2 + (x ā’ Āµ)2

162 CHAPTER 3. BASIC MONTE CARLO METHODS

The modiļ¬ed Bessel function of the second kind KĪ» (x) is given in MATLAB

by besselk( Ī½, x) and in R by besselK(x,Ī½,expon.scaled=FALSE). The distri-

bution with probability density function given by (3.22) is called the normal

inverse Gaussian distribution with real-valued parameters x, Āµ, 0 Ā· Ī“ and

Ī± ā„ |Ī²|. The tails of the normal inverse Gaussian density are substantially

heavier than those of the normal distribution. In fact up to a constant

f (x; Ī±, Ī², Ī“, Āµ) ā¼ |x| ā’3/2 exp((ā“Ī± + Ī²)x) as x ā’ Ā±ā.

The moments of this distribution can be obtained from the moment gener-

ating function

Āø1/4

Ī±2 ā’ (Ī² + s)2

M (s) = eĀµs exp{Ī“(Ī±2 ā’Ī² 2 )1/2 ā’Ī“(Ī±2 ā’(s+Ī²)2 )1/2 } for |Ī²+s| < Ī±.

Ī±2 ā’ Ī² 2

(3.23)

These moments are:

E(X) = Āµ + Ī“Ī²(Ī±2 ā’ Ī² 2 )ā’1/2

var(X) = Ī“Ī±2 (Ī±2 ā’ Ī² 2 )ā’3/2

and the skewness and kurtosis:

skew = 3Ī²Ī±ā’1 Ī“ ā’1/2 (Ī±2 ā’ Ī² 2 )ā’1/4

kurtosis = 3Ī“ ā’1 Ī±ā’2 (Ī±2 + 4Ī² 2 )(Ī±2 ā’ Ī² 2 )ā’1/2 .

One of the particularly attractive features of this family of distributions,

shared by the normal and the stable family of distributions, is that it is closed

under convolutions. This is apparent from the moment generating function

(3.23) since

M N (s)

gives a moment generating function of the same form but with Āµ replaced by

ĀµN and Ī“ by Ī“N. In Figure 3.19 we plot the probability density function of a

member of this family of distributions.

GENERATING RANDOM NUMBERS FROM NON-UNIFORM CONTINUOUS DISTRIBUTIONS163

Figure 3.19: Normal Inverse Gaussian probability density function with Ī± =

Ī“ = 1, Ī² = 1 , Āµ = 0

2

164 CHAPTER 3. BASIC MONTE CARLO METHODS

Note the similarity to the normal density but with a modest amount of

skewness and increased weight in the tails. We can generate random variables

from this distribution as follows:

Sample T from an inverse Gaussian distribution (3.21)

Return X = Āµ + Ī²T + N (0, T )

where N (0, T ) is a normal random variable with mean 0 and variance T.

We sample from the inverse Gaussian by using a property of the distribution

that if T has density of the form (3.21) then

(T ā’ Īø)2

(3.24)

c2 T

has a chi-squared distribution with one degree of freedom (easily generated

as the square of a standard normal random variable). The algorithm is (see

Michael, Shucany, Hass (1976));

1. For

1

c= p , and Īø = Ī“c,

Ī±2 ā’ Ī² 2

generate G1 from the Gamma( 1 , Ī“ ) distribution. Deļ¬ne

c

2

r

2

Y1 = 1 + G1 (1 ā’ 1 + )}.

G1

1

2. Generate U2 ā¼ U [0, 1]. If U2 Ā· then output T = ĪøY1

1+Y1

3. Otherwise output T = ĪøY1ā’1 .

The two values ĪøY1 , and ĪøY1ā’1 are the two roots of the equation obtained by

setting (3.24) equal to a chi-squared variate with one degree of freedom and the

1

relative values of the probability density function at these two roots are 1+Y1

1

and 1 ā’ 1+Y1 .

Finally to generate from the normal inverse Gaussian distribution (3.22) we

generate an inverse gamma random variable above and then set X = Āµ + Ī²T +

GENERATING RANDOM NUMBERS FROM NON-UNIFORM CONTINUOUS DISTRIBUTIONS165

Figure 3.20: The Normal and the Normal inverse Gaussian ļ¬t to the S&P500

Returns

N (0, T ). Prause (1999) provides a statistical evidence that the Normal Inverse

Gaussian provides a better ļ¬t than does the normal itself. For example we ļ¬t the

normal inverse gamma distribution to the S&P500 index returns over the period

Jan 1, 1997-Sept 27, 2002. There were a total of 1442 values over this period.

Figure 3.20 shows a histogram of the daily returns together with the normal

and the NIG ļ¬t to the data. The mean return over this period is 8 Ć— 10ā’5 and

the standard deviation of returns 0.013. If we ļ¬t the normal inverse Gaussian

distribution to these returns we obtain parameter estimates

Ī± = 95.23, Ī² = ā’4.72, Ī“ = 0.016, Āµ = 0.0009

and the Q-Q plots in Figure 3.21 . Both

166 CHAPTER 3. BASIC MONTE CARLO METHODS

Figure 3.21: QQ plots showing the Normal Inverse Gaussian and the Normal ļ¬t

to S&P 500 data, 1997-2002

indicate that the normal approximation fails to properly ļ¬t the tails of the

distribution but that the NIG distribution is a much better ļ¬t. This is similar to

the conclusion in Prause using observations on the Dow Jones Industrial Index.

Generating Random Numbers from Discrete Dis-

tributions

Many of the methods described above such as inversion and acceptance-rejection

for generating continuous distributions work as well for discrete random vari-

ables. Suppose for example X is a discrete distribution taking values on the

integers with probability function P [X = x] = f (x), for x = 0, 1, 2, ... Suppose

we can ļ¬nd a continuous random variable Y which has exactly the same value

of its cumulative distribution function at these integers so that FY (j) = FX (j)

for all j = 1, 2, .... Then we may generate the continuous random variable Y,

say by inversion or acceptance-rejection and then set X = bY c the integer part

GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS167

of Y . Clearly X takes integer values and since P [X Ā· j] = P [Y Ā· j] = FX (j)

for all j = 0, 1, ..., then X has the desired distribution. The continuous ex-

ponential distribution and the geometric distribution are linked in this way. If

X has a geometric(p) distribution and Y has the exponential distribution with

parameter Ī» = ā’ ln(1 ā’ p), then X has the same distribution as dY e or bY c + 1.

Using the inverse transform method for generating discrete random vari-

ables is usually feasible but for random variables with a wide range of values

of reasonably high probability, it often requires some setup costs to achieve

reasonable eļ¬ciency. For example if X has cumulative distribution function

F (x), x = 0, 1, ...inversion requires that we output an integer X = F ā’1 (U ),an

integer X satisfying F (X ā’ 1) < U Ā· F (X). The most obvious technique for

ļ¬nding such a value of X is to search sequentially through the potential values

x = 0, 1, 2, .... Figure 3.22 is the search tree for inversion for the distribution on

the integers 0, . . . 4 given by

0 1 2 3 4

x

0.11 0.30 0.25 0.21 0.13

f (x)

We generate an integer by repeatedly comparing a uniform [0,1] variate

U with the value at each node, taking the right branch if it is greater than

this threshold value, the left if it is smaller. If X takes positive integer values

{1, 2, ..., N },the number of values searched will average to E(X) which for many

discrete distributions can be unacceptably large.

An easy alternative is to begin the search at a value m which is near the

median (or mode or mean) of the distribution. For example we choose m = 2

and search to the left or right depending on the value of U in Figure 3.23.

If we assume for example that we root the tree at m then this results in

searching roughly an average of E[|X ā’ m + 1|] before obtaining the generated

variable. This is often substantially smaller than E(X) especially when E(X)

is large but still unacceptably large when the distribution has large variance.

168 CHAPTER 3. BASIC MONTE CARLO METHODS

.11

0

.41

1

.66

2

.87

3 4

distribution: .11 .30 .25 .21 .13

Figure 3.22: Sequential Search tree for Inverse Transform with root at x = 0

.66

.87

.41

4

2 3

.11

0 1

Figure 3.23: Search tree rooted near the median

GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS169

distribution .11 .30 .25 .21 .13

.45

.55

.24

2

.11 3

1

4

0

Figure 3.24: Optimal Binary Search Tree

An optimal binary search tree for this distribution is graphed in Figure

3.24. This tree has been constructed from the bottom up as follows. We begin

by joining the two smallest probabilities f (4) and f (0) to form a new node

with weight f (0) + f (4) = 0.24. Since we take the left path (towards X =

0 rather than towards X = 4) if U is smaller than the value .11 labelling

the node at the intersection of these two branches. We now regard this pair

of values as a unit and continue to work up the tree from the leaves to the

root. The next smallest pair of probabilities are {0, 1} and {3} which have

probabilities 0.24 and 0.21 respectively so these are the next to be joined hence

working from the leaves to the root of the tree. This optimal binary search

tree provides the minimum expected number of comparisons and is equivalent

to sorting the values in order of largest to smallest probability, in this case

1, 2, 3, 4, 0 , relabelling them or coding them {0, 1, 2, 3, 4} and then applying the

inverse transform method starting at 0.

The leaves of the tree are the individual probabilities f (j) and the internal

170 CHAPTER 3. BASIC MONTE CARLO METHODS

nodes are sums of the weights or probabilities of the āchildrenā, the values f (j)

for j on paths below this node. Let Di represents the depth of the i0 th leaf so for

example the depth of leaf 0 in Figure 3.24 is D0 = 3. Then the average number

of comparisons to generate a single random variable Xstarting at the root is

P

i f (i)Di . The procedure for constructing the last tree provides an optimal

algorithm in the sense that this quantity is minimized. It is possible to show that

an optimal binary search tree will reduce the average number of comparisons

from E(X) for ordinary inversion to less than 1 + 4 [log2 (1 + E(X))].

Another general method for producing variates from a discrete distribution

was suggested by Walker (1974, 1977) and is called the alias method. This is

based on the fact that every discrete distribution is a uniform mixture of two-

point distributions. Apart from the time required to set up an initial table of

aliases and aliasing probabilities, the time required to generate values from a dis-

crete distribution with K supporting points is bounded in K, whereas methods

such as inverse transform have computational time which increase proportion-

ally with E(X).

Consider a discrete distribution of the form with probability function f (j)

on K integers j = 1, 2, ...K. We seek a table of values of A(i) and associated

āaliasā probabilities q(i) so that the desired discrete random variable can be

generated in two steps, ļ¬rst generate one of the integers {1, 2, ..., K} at random

and uniformly, then if we generated the value I, say, replace it by an āaliasā value

A(I) with alias probability q(I). These values A(I) and q(I) are determined

below. The algorithm is:

GENERATE I UNIFORM ON {1, ...K}.

WITH PROBABILITY q(I), OUTPUT X = I, OTHERWISE, X = A(I).

An algorithm for producing these values of (A(i), q(i)), i = 1, ..., K} is sug-

gested by Walker(1977) and proceeds by reducing the number of non-zero prob-

abilities one at a time.

GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS171

1. Put q(i) = Kf (i) for all i = 1, ..., K.

2. LET m be the index so that q(m) = min{q(i); q(i) > 0} and let q(M ) =

max{q(i); q(i) > 0}.

3. SET A(m) = M and ļ¬x q(m) ( it is no longer is subject to change).

4. Replace q(M ) by q(M ) ā’ (1 ā’ q(m))

5. Replace (q(1), ...q(K))by (q(1), ..., q(m ā’ 1), q(m + 1), .q(M ) (so the com-

ponent with index m is removed).

6. Return to 2 unless all remaining qi = 1 or the vector of qi ā™s is empty.

Note that on each iteration of the steps above, we ļ¬x one of components

q(m) and remove it from the vector and adjust one other, namely q(M ). Since

we always ļ¬x the smallest q(m) and since the average q(i) is one, we always

obtain a probability, i.e. ļ¬x a value 0 < q(m) Ā· 1. Figure 3.25 shows the way

in which this algorithm proceeds for the distribution

1 2 3 4

x=

.1 .2 .3 .4

f (x) =

We begin with q(i) = 4 Ć— f (i) = .4, .8, 1.2, 1.6 for i = 1, 2, 3, 4. Then since

m = 1 and M = 4 these are the ļ¬rst to be adjusted. We assign A(1) = 4 and

q(1) = 0.4. Now since we have reassigned mass 1 ā’ q(1) to M = 4 we replace

q(4) by 1.6 ā’ (1 ā’ 0.4) = 1. We now ļ¬x and remove q(1) and continue with

q(i) = .8, 1.2, 1.0 for i = 2, 3, 4. The next step results in ļ¬xing q(2) = 0.8,

A(2) = 3 and changing q(3) to q(3) ā’ (1 ā’ q(2)) = 1. After this iteration,

the remaining q(3), q(4) are both equal to 1, so according to step 6 we may

terminate the algorithm. Notice that we terminated without assigning a value

to A(3) and A(4). This assignment is unnecessary since the probability the alias

A(i) is used is (1 ā’ q(i)) which is zero in these two cases. The algorithm

therefore results in aliases A(i) = 4, 3, i = 1, 2 and q(i) = .4, .8, 1, 1, respectively

for i = 1, 2, 3, 4. Geometrically, this method iteratively adjusts a probability

histogram to form a rectangle with base K as in Figure 3.25.

172 CHAPTER 3. BASIC MONTE CARLO METHODS

distribution aliased: .1 .2 .3 .4

1

A(2)

=3

A(1)=

4

1.6

1.6

1.2

.8

.4

4 3

1 2 3 1 2 4

Figure 3.25: The alias method for generating from the distribution 0.1 0.2 0.3

0.4

Suppose I now wish to generate random variables from this discrete distrib-

ution. We simply generate a random variable uniform on the set {1, 2, 3, 4} and

if 1 is selected, we replace it by A(1) = 4 with probability 1 ā’ q(1) = 0.6. If 2

is selected it is replaced by A(2) = 3 with probability 1 ā’ q(2) = 0.2.

Acceptance-Rejection for Discrete Random Variables

The acceptance-rejection algorithm can be used both for generating discrete and

continuous random variables and the geometric interpretation in both cases is

essentially the same. Suppose for example we wish to generate a discrete random

variable X having probability function f (x) using as a dominating function

a multiple of g(x) the probability density function of a continuous random

variable. Take for example the probability function

GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS173

0.9

0.8

0.7

0.6

Reject

0.5

0.4

0.3

0.2

Accept

0.1

0

1 2 3 4

X

Figure 3.26: Acceptance-Rejection with for Discrete Distribution with continu-

ous dominating function.

1 2 3 4

x=

f (x) = .1 .3 .4 .2

using the dominating function 2g(x) = 0.1 + 0.2(x ā’ 0.5) for 0.5 < x < 4.5. It

is easy to generate a continuous random variable from the probability density

function g(x) by inverse transform. Suppose we generate the value X. Then if

this value is under the probability histogram graphed in Figure 3.26 we accept

the value (after rounding it to the nearest integer to conform the discreteness

of the output distribution) and otherwise we reject and repeat.

We may also dominate a discrete distribution with another discrete distrib-

ution in which case the algorithm proceeds as in the continuous case but with

the probability density functions replaced by probability functions.

The Poisson Distribution.

Consider the probability function for a Poisson distribution with parameter Ī»

Ī»x eā’Ī»

(3.25)

f (x) = , x = 0, 1, ...

x!

The simplest generator is to use the Poisson process. Recall that a Poisson

process with rate 1 on the real line can be described in two equivalent ways:

174 CHAPTER 3. BASIC MONTE CARLO METHODS

1. Points are distributed on the line in such a way that the spacings be-

tween consecutive points are independent exponential(Ī») random vari-

ables. Then the resulting process is a Poisson process with rate Ī».

2. The number of points in an interval of length h has a Poisson (Ī»h) distri-

bution. Moreover the numbers of points in non-overlapping intervals are

independent random variables.

The simplest generator stems from this equivalence. Suppose we use the

ļ¬rst speciļ¬cation to construct a Poisson process with rate parameter 1 and

then examine X = the number of points occurring in the interval [0, Ī»]. This

is the number of partial sums of exponential(1) random variables that are less

than or equal to Ī»

n+1

X

X = inf{n; (ā’lnUi ) > Ī»}

i=1

or equivalently

n+1

Y

Ui < eā’Ī» } (3.26)

X = inf{n;

i=1

This generator requires CPU time which grows linearly with Ī» since the

number of exponential random variables generated and summed grows linearly

with Ī» and so an alternative for large Ī» is required. Various possibilities of

acceptance-rejection algorithms have been suggested including dominating the

Poisson probability function with multiples of the logistic probability density

function (Atkinson (1979)), the normal density with exponential right tail (cf.

Devroye, lemma 3.8, page 509). A simple all-purpose dominating function is the

so-called table-mountain function (cf. Stadlober (1989)), essentially a function

with a ļ¬‚at top and tails that decrease as 1/x2 . Another simple alternative for

generating Poisson variates that is less eļ¬cient but simpler to implement is to

use the Lorentzian, or truncated Cauchy distribution with probability density

function

c0

(3.27)

g(x|a, b) = ,x > 0

b2 + (x ā’ a)2

GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS175

where c0 is the normalizing constant. A random variable is generated from this

distribution using the inverse transform method; X = a + b tan(ĻU ),, where

U ā¼ U [0, 1]. Provided that we match the modes of the distribution a = Ī» and

ā

put b = 2Ī», this function may be used to dominate the Poisson distribution

and provide a simple rejection generator. The Matlab Poisson random number

generator is poissrnd (Ī», m, n) which generates an m Ć— n matrix of Poisson(Ī»)

variables. This uses the simple generator (3.26) and is not computationally

eļ¬cient for large values of Ī».In R the command rpois(n,Ī») generates a vector

of n Poisson variates.

The Binomial Distribution

For the Binomial distribution, we may use any one of the following alternatives:

Pn

I(Ui < p), Ui ā¼ independent uniform[0, 1]

(1) X = i=1

Px+1

Gi > n}, where Gi ā¼independent Geometric(p)

(2) X = inf{x; i=1

Px+1 Ei

> ā’log(1ā’p)}, where Ei ā¼independent Exponential(1).

(3) X = inf{x; i=1 nā’i+1

Method (1) obtains from the deļ¬nition of the sum of independent Bernoulli

random variables since each of the random variables I(Ui < p) are independent,

have values 0 and 1 with probabilities 1 ā’ p and p respectively. The event

(Ui < p) having probability p is typically referred to as a āsuccessā. Obviously

this method will be slow if n is large. For method (2), recall that the number of

trials necessary to obtain the ļ¬rst success, G1 , say, has a geometric distribution.

Similarly, G2 represents the number of additional trials to obtain the second

success. So if X = j, the number of trials required to obtain j + 1 successes

was greater than n and to obtain j successes, less than or equal to n. In other

words there were exactly j successes in the ļ¬rst n trials. When n is large but np

fairly small, method (2) is more eļ¬cient since it is proportional to the number

os successes rather than the total number of trials. Of course for large n and

176 CHAPTER 3. BASIC MONTE CARLO METHODS

np suļ¬ciently small (e.g. <1), we can also replace the Binomial distribution

by its Poisson (Ī» = np) approximation. Method (3) is clearly more eļ¬cient if

ā’log(1 ā’ p) is not too large so that p is not too close to 1, because in this case

we need to add fewer exponential random variables.

For large mean np and small n(1 ā’ p) we can simply reverse the role of

successes and failures and use method (2) or (3) above. But if both np and

n(1 ā’ p) are large, a rejection method is required. Again we may use rejection

p

beginning with a Lorentzian distribution, choosing a = np, and b = 2np(1 ā’ p)

in the case p < 1/2. When p > 1/2, we simply reverse the roles of āfailuresā

and āsuccessesā. Alternatively, a dominating table-mountain function may be

used (Stadlober (1989)). The binomial generator in Matlab is the function

binornd(n,p,j,k) which generates an n Ć— k matrix of binomial(n, p) random

variables. This uses the simplest form (1) of the binomial generator and is not

computationally eļ¬cient for large n. In R, rbinom(m,n,p) will generate a vector

of length m of Binomial(n, p) variates.

Random Samples Associated with Markov Chains

Consider a ļ¬nite state Markov Chain, a sequence of (discrete) random variables

X1 , X2 , . . .each of which takes integer values 1, 2, . . . N (called states). The

number of states of a Markov chain may be large or even inļ¬nite and it is not

always convenient to label them with the positive integers and so it is common

to deļ¬ne the state space as the set of all possible states of a Markov chain, but

we will give some examples of this later. For the present we restrict attention to

the case of a ļ¬nite state space. The transition probability matrix is a matrix P

describing the conditional probability of moving between possible states of the

RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 177

chain, so that

P [Xn+1 = j|Xn = i] = Pij , i = 1, . . . N, j = 1, . . . N.

P

where Pij ā„ 0 for all i, j and for all i. A limiting distribution

Pij = 1

j

of a Markov chain is a vector (Ļ say) of long run probabilities of the individual

states with the property that

Ļi = limtā’ā P [Xt = i].

A stationary distribution of a Markov chain is the column vector (Ļ say) of

probabilities of the individual states such that

Ļ0 P = Ļ0 . (3.28)

Ļ 0 P = Ļ0 . For a Markov chain, every limiting distribution is in fact a station-

ary distribution. For the basic theory of Markov Chains, see the Appendix.

Roughly, a Markov chain which eventually āforgetsā the states that were occu-

pied in the distant path, in other words for which the probability of the current

states does not vary much as we condition on diļ¬erent states in the distant

past, is called ergodic. A Markov chain which simply cycles through three

states 1 ā’ 2 ā’ 3 ā’ 1 ā’ ... is an example of a periodic chain, and is not

ergodic.

It is often the case that we wish to simulate from a ļ¬nite ergodic Markov

chain when it has reached equilibrium or stationarity, which is equivalent to

sampling from the distribution of Xn assuming that the distribution of X0

is given by the stationary distribution Ļ. In a few cases, we can obtain this

stationary distribution directly from (3.28) but when N is large this system of

equations is usually not feasible to solve and we need to ļ¬nd another way to

sample from the probability vector Ļ. Of course we can always begin the Markov

chain in some arbitrary initial state and run it waiting for Hele to freeze over (it

does happen since Helle is in Devon) until we are quite sure that the chain has

178 CHAPTER 3. BASIC MONTE CARLO METHODS

essentially reachedequilibrium, and then use a subsequent portion of this chain,

discarding this initial period, sometimes referred to as the āinitial transientā.

Clearly this is often not a very eļ¬cient method, particularly in cases in

which the chain mixes or forgets its past very slowly for in this case the required

initial transient is long. On the other hand if we shortened it, we run the risk of

introducing bias into our simulations because the distribution generated is too

far from the equilibrium distribution Ļ. There are a number of solutions to this

problem proposed in a burgeoning literature. Here we limit ourselves to a few

of the simpler methods.

Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm is a method for generating random variables

from a distribution Ļ that applies even in the case of an inļ¬nite number of

states or a continuous distribution Ļ. It is assumed that Ļ is known up to some

multiplicative constant. Roughly, the method consists of using a convenient

āproposalā Markov chain with transition matrix Q to generate transitions, but

then only āacceptā the move to these new states with probability that depends

on the distribution Ļ. The idea resembles that behind importance sampling.

The basic result on which the Metropolis-Hastings algorithm is pinned is the

following theorem.

Theorem 31 Suppose Qij is the transition matrix of a Markov chain. Assume

PN

that g is a vector of non-negative values such that i=1 gi = G and

gj

| | Ā· K < ā for all i, j

Qij

for some ļ¬nite value K. Deļ¬ne

gj Qji

Ļij = min(1, )

gi Qij

Then the Markov Chain with transition probability matrix

Pij = Qij Ļij , for i 6= j (3.29)

RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 179

gi

has stationary distribution Ļi = G.

Proof. The proof consists of showing that the so-called ādetailed balance

gi

conditionā is satisļ¬ed, i.e. with Ļi = that

G,

Ļi Pij = Ļj Pji , for all i, j. (3.30)

This condition implies that when the chain is operating in equilibrium,

P [Xn = i, Xn+1 = j] = P [Xn = j, Xn+1 = i]

reļ¬‚ecting a cavalier attitude to the direction in which time ļ¬‚ows or reversibility

of the chain. Of course (3.30) is true automatically if i = j and for i 6= j,

gi gj Qji

Ļi Pij = Qij min(1, )

G gi Qij

1

= min(gi Qij , gj Qji )

G

= Ļj Pji

1

by the symmetry of the function min(gi Qij , gj Qji ). Now the detailed balance

G

condition (3.30) implies that Ļ is a stationary distribution for this Markov chain

since

N N

X X

Ļi Pij = Ļj Pji

i=1 i=1

N

X

= Ļj Pji

i=1

= Ļj for each j = 1, ..., N.

Provided that we are able to generate transitions for the Markov Chain with

transition matrix Q, it is easy to generate a chain with transition matrix P in

(3.29). If we are currently in state i, generate the next state with probability

Qij . If j = i then we stay in state i. If j 6= i, then we āacceptā the move to state

j with probability Ļij , otherwise we stay in state i. Notice that the Markov

180 CHAPTER 3. BASIC MONTE CARLO METHODS

Chain with transition matrix P tends to favour moves which increase the value

of Ļ. For example if the proposal chain is as likely to jump from i to j as it

is to jump back so that Qij = Qji , then if Ļj > Ļi the move to j is always

Ļj

accepted whereas if Ļj < Ļi the move is only accepted with probability The

Ļi .

assumption Qij = Qji is a common and natural one, since in applications of

the Metropolis-Hastings algorithm, it is common to choose j āat randomā (i.e.

uniformly distributed) from a suitable neighborhood of i.

The above proof only provides that Ļ is a stationary distribution of the

Markov Chain associated with P, not that it is necessarily the limiting distrib-

ution of this Markov chain. For this to follow we need to know that the chain

is ergodic. Various conditions for ergodicity are given in the literature. See for

example Robert and Casella (1999, Chapter 6) for more detail.

Gibbs Sampling

There is one simple special case of the Metropolis-Hastings algorithm that is

particularly simple, common and compelling. To keep the discussion simple,

suppose the possible states of our Markov Chain are points in two-dimensional

space (x, y). We may assume both components are discrete or continuous. Sup-

pose we wish to generate observations from a stationary distribution which is

proportional to g(x, y) so

g(x, y)

Ļ(x, y) = P P (3.31)

y g(x, y)

x

deļ¬ned on this space but that the form of the distribution is such that directly

generating from this distribution is diļ¬cult, perhaps because it is diļ¬cult to

obtain the denominator of (3.31). However there are many circumstances where

it is much easier to obtain the value of the conditional distributions

Ļ(x, y)

Ļ(x|y) = P and

z Ļ(z, y)

Ļ(x, y)

Ļ(y|x) = P

z Ļ(x, z)

RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 181

Now consider the following algorithm: begin with an arbitrary value of y0

and generated x1 from the distribution Ļ(x|y0 ) followed by generating y1 from

the distribution Ļ(y|x1 ). It is hard to imagine a universe in which iteratively

generating values xn+1 from the distribution Ļ(x|yn ) and then yn+1 from the

distribution Ļ(y|xn+1 ) does not, at least asymptotically as n ā’ ā, eventually

lead to a draw from the joint distribution Ļ(x, y). Indeed that is the case since

the transition probabilities for this chain are given by

P (xn+1 , yn+1 |xn , yn ) = Ļ(xn+1 |yn )Ļ(yn+1 |xn+1 )

and it is easy to show directly from these transition probabilities that

X

P (x1 , y1 |x, y)Ļ(x, y)

(x,y)

X X

= Ļ(y1 |x1 ) Ļ(x1 |y) Ļ(x, y)

y x

X

= Ļ(y1 |x1 ) Ļ(x1 , y)

y

= Ļ(x1 , y1 ).

Of course the real power of Gibbs Sampling is achieved in problems that are

not two-dimensional such as the example above, but have dimension suļ¬ciently

high that calculating the sums or integrals in the denominator of expressions

like (3.31) is not computationally feasible.

Coupling From the Past: Sampling from the stationary dis-

tribution of a Markov Chain

All of the above methods assume that we generate from the stationary distri-

bution of a Markov chain by the āuntil Hele freezes overā method, i.e. wait

until run the chain from an arbitrary starting value and then delete the initial

transient. An alternative elegant method that is feasible at least for some ļ¬nite

182 CHAPTER 3. BASIC MONTE CARLO METHODS

state Markov chains is the method of ācoupling from the pastā due to Propp

and Wilson (1996).

We assume that we are able to generate transitions in the Markov Chain. In

other words if the chain is presently in state i at time n we are able to generate

a random variable Xn+1 from the distribution proportional to Pij , j = 1, ...K.

Suppose F (x|i) is the cumulative distribution function P (Xn+1 Ā· x|Xn = i)

and let us denote its inverse by F ā’1 (y|i). So if we wish to generate a random

variable Xn+1 conditional on Xn , we can use the inverse transform Xn+1 =

F ā’1 (Un+1 |Xn ) applied to the Uniform[0,1] random variable Un+1 . Notice that

a starting value say Xā’100 together with the sequence of uniform[0,1] variables

(Uā’99 , ..., U0 ) determines the chain completely over the period ā’100 Ā· t Ā· 0.

If we wish to generated the value of Xt given Xs , s < t, then we can work this

expression backwards

Xt = F ā’1 (Utā’1 |Xtā’1 )

= F ā’1 (Utā’1 |F ā’1 (Utā’2 |Xtā’2 ))

= F ā’1 (Utā’1 |F ā’1 (Utā’2 |...F ā’1 (Utā’1 |F ā’1 (Us |i))))

t

= Fs (Xs ), say.

Now imagine an inļ¬nite sequence {Ut , t = ..., ā’3, ā’2, ā’1} of independent uni-

form[0,1] random variables that was used to generate the state X0 of a chain

at time 0. Let us imagine for the moment that there is a value of M such that

0

Fā’M (i) is a constant function of i. This means that for this particular draw

of uniform random numbers, whatever the state i of the system at time ā’M,

0

the same state X0 = Fā’M (i) is generated to time 0. All chains, possibly with

diļ¬erent behaviour prior to time ā’M are ācoupledā at time ā’M and identical

from then on. In this case we say that coalescence has occurred in the interval

[ā’M, 0]. No matter where we start the chain at time ā’M it ends up in the

same state at time 0, so it is quite unnecessary to simulate the chain over the

whole inļ¬nite time interval ā’ā < t Ā· 0. No matter what state is occupied

RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 183

at time t = ā’M, the chain ends up in the same state at time t = 0. When

coalescence has occurred, we can safely consider the common value of the chain

at time 0 to be generated from the stationary distribution since it is exactly

the same value as if we had run the chain from t = ā’ā.

There is sometimes an easy way to check whether coalescence has occurred

in an interval, if the state space of the Markov chain is suitably ordered. For

example suppose the states are numbered 1, 2, ..., N. Then it is sometimes

possible to relabel the states so that the conditional distribution functions F (x|i)

are stochastically ordered, or equivalently that F ā’1 (U |i) is monotonic (say

monotonically increasing) in i for each value of U. This is the case for example

Pj

provided that the partial sums l=1 Pil are increasing functions of i for each

0

j = 1, 2, ..., N. If follows that the functions Fā’M (i) are all monotonic functions

of i and so

0 0 0

Fā’M (1) Ā· Fā’M (2) Ā· ...Fā’M (N ).

0 0 0

Therefore, if Fā’M (1) = Fā’M (N ), then Fā’M (i) must be a constant function.

Notice also that if there is any time in an interval [s, t] at which coalescence

t

occurs so that Fs (i) is a constant function of i, then for any interval [S, T ]

T

containing it [S, T ] ā [s, t], FS (i) is also a constant function of i.

It is easy to prove that coalescence occurs in the interval [ā’M, 0] for suf-

ļ¬ciently large M. For an ergodic ļ¬nite Markov chain, there is some step size

Ļ„ such that every transition has positive probability P [Xt+Ļ„ = j|Xt = i] > Ā²

for all i, j. Consider two independent chains, one beginning in state i and the

other in state i0 at time t = 0. Then the probability that they occupy the same

state j at time t = Ļ„ is at least Ā²2 . It is easy to see that if we use inverse

transform to generate the transitions and if they are driven by common random

numbers then this can only increase the probability of being in the same state,

so the probability these two chains are coupled at time Ļ„ is at least Ā²2 . Similarly

for N possible states, the probability of coalescence in an interval of length Ļ„

184 CHAPTER 3. BASIC MONTE CARLO METHODS

is at least ĪµN > 0. Since there are inļ¬nitely many intervals disjoint of length

Ļ„ in [ā’ā, 0] and the events that there is a coalescence in each interval are

independent, the probability that coalescence occurs somewhere in [ā’ā, 0] is

1.

We now detail the Propp Wilson algorithm

1. Set M = 1, XU = N, XL = 1

2. Generate Uā’M ....Uā’M/2+1 all independent U nif orm[0, 1].

3. For t = ā’M to ā’1 repeat

(a) obtain XL = F ā’1 (Utā’1 |XL ) and XU = F ā’1 (Utā’1 |XU ).

(b) If XL = XU stop and output X(0) = XL

4. Otherwise, set M = 2M and go to step 2.

This algorithm tests for coalescence repeatedly by starting on the intervals

[ā’1, 0], [ā’2, ā’1], [ā’4, ā’2], [ā’8, ā’4].

We are assured that with probability one, the process will terminate with co-

alescence after a ļ¬nite number of steps. Moreover, in this algorithm that the

random variable Ut once generated is NOT generated again on a subsequent

pass when M is doubled. The generated Ut is reused at each pass until coales-

cence occurs. If Ut were regenerated on subseuqent passes, this would lead to

bias in the algorithm.

It may well be that this algorithm needs to run for a very long time before

achieving coalescence and an impatient observer who interrupts the algorithm

prior to coalescence and starts over will bias the results. Varous modiļ¬cations

have been made to speed up the algorithm (e.g. Fill, 1998).

RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 185

Sampling from the Stationary Distribution of a Diļ¬usion

Process

A basic Ito process of the form

dXt = a(Xt )dt + Ļ(Xt )dWt

is perhaps the simplest extension of a Markov chain to continuous time, contin-

uous state-space. It is well-known that under fairly simple conditions, there is

a unique (strong) solution to this equation and that the limiting distribution of

XT as T ā’ ā has stationary distribution with probability density function

Z x

1 a(z)

f (x) = c 2 exp{2 dz}

Ļ 2 (z)

Ļ (x) 0

where the constant c is chosen so that the integral of the density is 1. To be

able to do this we need to assume that

Z Z

ā x

1 a(z)

dz}dx < ā. (3.32)

exp{2

Ļ 2 (x) Ļ 2 (z)

ā’ā 0

In order to generate from this stationary distribution, we can now start the

process at some arbitrary value X0 and run it for a very long time T , hoping

that this is suļ¬ciently long that the process is essentially in its stationary

state, or try to generate X0 more directly from (3.32) in which case the process

is beginning (and subsequently running) with its stationary distribution.

For an example, let us return to the CIR process

1/2

dXt = k(b ā’ Xt )dt + ĻXt (3.33)

dWt .

In this case

a(x) = k(b ā’ x), for x > 0,

Ļ 2 (x) = Ļ2 x, for x > 0.

186 CHAPTER 3. BASIC MONTE CARLO METHODS

Notice that

Z x

k(b ā’ z)

1 1 2kb k

dz} = 2 xā’1 exp{ 2 ln(x/Īµ) ā’ 2 (x ā’ Īµ)}

exp{2

Ļ2x 2z

Ļ Ļ Ļ Ļ

Īµ

is proportional to

2

x2kb/Ļ ā’1

exp{ā’kx/Ļ 2 }

and the integral of this function, a Gamma function, will fail to converge unless

2kb/Ļ 2 ā’ 1 > ā’1 or 2kb > Ļ 2 . Under this condition the stationary distribution

2

of the CIR process is Gamma(2kb/Ļ2 , Ļ ). If this condition fails and 2kb < Ļ 2 ,

k

then the process Xt is absorbed at 0. If we wished to simulate a CIR process in

equilibrium, we should generate starting values of X0 from the Gamma distrib-

ution. More generally for a CEV process satisfying

Ī³/2

dXt = k(b ā’ Xt )dt + ĻXt (3.34)

dWt

a similar calculation shows that the stationary density is proportional to

2kb 1 k

xā’Ī³ exp{ā’ ā’ 2 xĪ³ }, for Ī³ > 1.

Ļ 2 xĪ³ā’1 (Ī³ ā’ 1) Ļ Ī³

Simulating Stochastic Partial Diļ¬erential Equa-

tions.

Consider a derivative product whose underlying asset has price Xt which follows

some model. Suppose the derivative pays an amount V0 (XT ) on the maturity

date T. Suppose that the value of the derivative depends only on the current time

t and the current value of the asset S, then its current value is the discounted

future payoļ¬, an expectation of the form

Z T

(3.35)

V (S, t) = E[V0 (XT )exp{ā’ r(Xv , v)dv}|Xt = S]

t

where r(Xt , t) is the current spot interest rate at time t. In most cases, this ex-

pectation is impossible to evaluate analytically and so we need to resort to

SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.187

numerical methods. If the spot interest rate is function of both arguments

(Xv , v) and not just a function of time, then this integral is over the whole

joint distribution of the process Xv , 0 < v < T and simple one-dimensional

methods of numerical integration do not suļ¬ce. In such cases, we will usu-

ally resort to a Monte-Carlo method. The simplest version requires simulating

a number of sample paths for the process Xv starting at Xt = S, evaluating

RT

V0 (XT )exp{ā’ t r(Xv , v)dv} and averaging the results over all simulations. We

begin by discussing the simulation of the process Xv required for integrations

such as this.

Many of the stochastic models in ļ¬nance reduce to simple diļ¬usion equation

(which may have more than one factor or dimension). Most of the models

in ļ¬nance are Markovian in the sense that at any point t in time, the future

evolution of the process depends only on the current state Xt and not on the

past behaviour of the process Xs , s < t. Consequently we restrict to a āMarkov

diļ¬usion modelā of the form

(3.36)

dXt = a(Xt , t)dt + Ļ(Xt , t)dWt

with some initial value X0 for Xt at t = 0. Here Wt is a driving standard Brown-

ian motion process. Solving deterministic diļ¬erential equations can sometimes

provide a solution to a speciļ¬c problem such as ļ¬nding the arbitrage-free price of

a derivative. In general, for more complex features of the derivative such as the

distribution of return, important for considerations such as the Value at Risk,

we need to obtain a solution {Xt , 0 < t < T }to an equation of the above form

which is a stochastic process. Typically this can only be done by simulation.

One of the simplest methods of simulating such a process is motivated through

a crude interpretation of the above equation in terms of discrete time steps, that

is that a small increment Xt+h ā’ Xt in the process is approximately normally

distributed with mean given by a(Xt , t)hand variance given by Ļ 2 (Xt , t)h. We

generate these increments sequentially, beginning with an assumed value for

188 CHAPTER 3. BASIC MONTE CARLO METHODS

X0 , and then adding to obtain an approximation to the value of the process

at discrete times t = 0, h, 2h, 3h, . . .. Between these discrete points, we can

linearly interpolate the values. Approximating the process by assuming that

the conditional distribution of Xt+h ā’ Xt is N (a(Xt , t)h, Ļ 2 (Xt , t)h) is called

Eulerā™s method by analogy to a simple method by the same name for solving or-

dinary diļ¬erential equations. Given simulations of the process satisfying (3.36)

together with some initial conditions, we might average the returns on a given

derivative for many such simulations, (provided the process is expressed with

respect to the risk-neutral distribution), to arrive at an arbitrage-free return for

the derivative.

In this section we will discuss the numerical solution, or simulation of the

solution to stochastic diļ¬erential equations.

Letting ti = iāx, Equation (3.36) in integral form implies

Z ti+1 Z ti+1

(3.37)

Xti+1 = Xti + a(Xs , s)ds + Ļ(Xs , s)dWs

ti ti

For the following lemma we need to introduce Op or āorder in probabilityā,

notation common in mathematics and probability. A sequence indexed by āt,

say Yāt = Op (āt)k means that when we divide this term by (āt)k and then

let āt ā’ 0, the resulting sequence is bounded in probability or that for each Īµ

there exists K < ā so that

Yāt

| > K] < Īµ

P [|

ātk

whenever |āt| < Īµ. As an example, if W is a Brownian motion, then āWt =

W (t+āt)ā’W (t) has a Normal distribution with mean 0 and standard deviation

ā

āt and is therefore Op (āt)1/2 . Similarly Then we have two very common

and useful approximations to a diļ¬usion given by the following lemma.

Lemma 32 If Xt satisļ¬es a diļ¬usion equation of the form (3.36) then

Xti+1 = Xti + a(Xti , ti )āt + Ļ(Xti , ti )āWt + Op (āt) (Euler approximation)

ā‚

Ļ(Xti , ti ) ā‚x Ļ(Xti , ti )

[(āWt )2 ā’ āt] + Op (āt)3/2 (Milstei

Xti+1 = Xti + a(Xti , ti )āt + Ļ(Xti , ti )āWt +

2

SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.189

Proof. Itoā™s lemma can be written in terms of two operators on functions

f for which the derivatives below exist;

df (Xt , t) = Lo f dt + L1 f dWt where

1 2 ā‚2

ā‚ ā‚

0

+ , and

L =a +Ļ 2

ā‚x 2 ā‚x ā‚t

ā‚

L1 = Ļ .

ā‚x

Integrating, this and applying to twice diļ¬erentiable functions a and Ļ and

s > ti ,

Z Z

s s

0

L1 a(Xu , u)dWu

a(Xs , s) = a(Xti , ti ) + L a(Xu , u)du +

t t

Z is Z is

L0 Ļ(Xu , u)du + L1 Ļ(Xu , u)dWu .

Ļ(Xs , s) = Ļ(Xti , ti ) +

ti ti

By substituting in each of the integrands in 3.37 using the above identity and

iterating this process we arrive at the Ito-Taylor expansions (e.g. Kloeden and

Platen, 1992). For example,

Z ti+1 Z ti+1 Z Z

s s

0

L1 a(Xu , u)dWu }ds

{a(Xti , ti ) +

a(Xs , s)ds = L a(Xu , u)du +

ti ti ti ti

Z ti+1 Z Z Z

s ti+1 s

0 1

ā a(Xti , ti )āt + L a(Xti , ti ) duds + L a(Xti , ti ) dWu ds

ti ti ti ti

The ļ¬rst term a(Xti , ti )āt, is an initial approximation to the desired inte-

gral and the rest is a lower order correction that we may regard as an er-

ror term for the moment. For example it is easy to see that the second term

Rt Rs Rt Rs

L0 a(Xti , ti ) tii+1 ti duds is Op (āt)2 because the integral tii+1 ti duds = (āt)2 /2

Rt Rs

and L0 a(Xti , ti ) is bounded in probability. The third term L1 a(Xti , ti ) tii+1 ti dWu ds

Rt Rs Rt

is Op (āt)3/2 since tii+1 ti dWu ds = tii+1 (ti+1 ā’ u)dWu and this is a normal

Rt

random variable with mean 0 and variance tii+1 (ti+1 ā’ u)2 du = (āt)3 /3. We

can write such a normal random variable as 3ā’1/2 (āt)3/2 Z for Z a standard

190 CHAPTER 3. BASIC MONTE CARLO METHODS

normal random variable and so this is obviously Op (āt)3/2 . Thus the simplest

Euler approximation to the distribution of the increment assumes that āX has

conditional mean a(Xti , ti )āt. Similarly

Z ti+1 Z ti+1 Z Z

s s

0

L1 Ļ(Xu , u)dWu }dWs

{Ļ(Xti , ti ) +

Ļ(Xs , s)dWs = L Ļ(Xu , u)du +

ti ti ti ti

Z ti+1 Z Z Z

s ti+1 s

0 1

ā Ļ(Xti , ti )āWt + L Ļ(Xti , ti ) dudWs + L Ļ(Xti , ti ) dWu dWs

ti ti ti ti

ā‚

Ļ(Xti , ti ) ā‚x Ļ(Xti , ti )

[(āWt )2 ā’ āt] + Op (āt)3/2

= Ļ(Xti , ti )āWt +

2

R ti+1 R s

dWu dWs = 1 [(āWt )2 ā’ āt], L0 Ļ(Xu , u) = Ļ(Xti , ti )+ Op (āt)1/2 ,

since 2

ti ti

Rt Rs

ā‚

L1 Ļ(Xu , u) = Ļ(Xu , u) ā‚x Ļ(Xu , u) and tii+1 ti dudWs = Op (āt)3/2 . Putting

these terms together, we arrive at an approximation to the increment of the

form

ā‚

Ļ(Xti , ti ) ā‚x Ļ(Xti , ti )

[(āWt )2 ā’āt]+Op (āt)3/2

āXt = a(Xti , ti )āt+Ļ(Xti , ti )āWt +

2

(3.38)

which allow an explicit representation of the increment in the process X in terms

of the increment of a Brownian motion process āWt ā¼ N (0, āt).

The approximation (3.38) is called the Milstein approximation, a reļ¬nement

of the ļ¬rst, the Euler approximation. It is the second Ito-Taylor approximation

to a diļ¬usion process. Obviously, the increments of the process are quadratic

functions of a normal random variable and are no longer normal. The error

approaches 0 at the rate Op (āt)3/2 in probability only. This does not mean

that the trajectory is approximated to this order but that the diļ¬erence between

the Milstein approximation to a diļ¬usion and the diļ¬usion itself is bounded in

probability when divided by (āt)3/2 and as we let āt ā’ 0. Higher order Taylor

approximations are also possible, although they grow excessively complicated

very quickly. See the book by Kloeden and Platten(1992) for details.

There remains the question of how much diļ¬erence it makes which of these

approximations we employ for a particular diļ¬usion. Certainly there is no dif-

ference at all between the two approximations in the case that the diļ¬usion

SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.191

coeļ¬cient Ļ(Xt , t) does not depend at all on Xt . In general, the diļ¬erence is

hard to assess but in particular cases we can at least compare the performance of

the two methods. The approximations turn out to be very close in most simple

cases. For example consider the stock price path in Figure 3.27. The dashed

line corresponds to a Milstein approximation whereas the piecewise continuous

line corresponds to the Euler approximation. In this case the Milstein appears

to be a little better, but if I run a number of simulations and compare the sum

of the squared errors (i.e. squared diļ¬erences between the approximate value

of Xt and the true value of Xt ) we ļ¬nd that the improvement is only about

two percent of the diļ¬erence. The same is true even if I change the value of

āt from 1/12 (i.e. one month) to 1/52 (i.e. one week). Unlike the behaviour

of higher order approximations to deterministic functions, there appears to be

little advantage in using a higher order approximation, at least in the case of

diļ¬usions with smooth drift and diļ¬usion coeļ¬cients.

We can compare using Milstein approximation on the original process and

using Eulerā™s approximation on a transformation of the process in the case that

the diļ¬usion term depends only on the state of the process (not time). In other

words, suppose we have an Ito process of the form

(3.39)

dXt = a(Xt , t)dt + Ļ(Xt )dWt

where Wt is an ordinary Wiener measure. A simple transformation reduces this

to a problem with constant diļ¬usion term. Suppose Ļ(x) > 0 for all x and let

Z x

1

dz, for x ā„ 0

s(x) =

Ļ(z)

0

Z 0

1

s(x) = ā’ dz for x < 0

Ļ(z)

x

where we assume these integrals are well deļ¬ned. Let g be the inverse function of

s. This inverse exists since the function is continuous monotonically increasing.

192 CHAPTER 3. BASIC MONTE CARLO METHODS

Figure 3.27: Comparison of Milstein and Euler approximation to stock with

āt = 1/12 year.

SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.193

Suppose we apply Itoā™s lemma to the transformed process Yt = s(Xt ). We obtain

1

dYt = {a(Xt , t)s0 (Xt ) + Ļ 2 (Xt )s00 (Xt )}dt + Ļ(Xt )s0 (Xt )dWt

2

Ļ0 (Xt )

a(Xt , t) 1 2

={ ā’ Ļ (Xt ) 2 }dt + dWt

Ļ(Xt ) 2 Ļ (Xt )

= Āµ(Yt , t)dt + dWt

where

a(g(Yt ), t) 1 0

ā’ Ļ (g(Yt )).

Āµ(Yt , t) =

Ļ(g(Yt )) 2

In other words, Y t satisļ¬es an Ito equation with constant diļ¬usion term.

Suppose we generate an increment in Yt using Eulerā™s method and then solve

for the corresponding increment in Xt .Then using the ļ¬rst two terms in the

Taylor series expansion of g,

1

āXt = g 0 (Yt )āYt + g 00 (Yt )(āYt )2

2

1

= g 0 (Yt )(Āµ(Yti , ti )āt + āWt ) + Ļ 0 (g(Yt ))Ļ(g(Yt ))(āYt )2

2

1 1

= {a(g(Yt ), t) ā’ Ļ(g(Yt ))Ļ 0 (g(Yt ))}āt + Ļ(g(Yt ))āWt + Ļ 0 (g(Yt ))Ļ(g(Yt ))(āYt )2

2 2

since

1

g 0 (Yt ) = = Ļ(g(Yt )) and

s0 (g(Y t ))

g 00 (Yt ) = Ļ0 (g(Yt ))Ļ(g(Yt )).

But since (āYt )2 = (āWt )2 + o(āt) it follows that

1 1

āXt = {a(Xt , t)ā’ Ļ(Xt )Ļ 0 (Xt )}āt+Ļ(Xt )āWt + Ļ 0 (Xt )Ļ(Xt )(āWt )2 +o(āt)

2 2

and so the approximation to this increment is identical, up to the order con-

sidered, to the Milstein approximation. For most processes, it is preferable

to apply a diļ¬usion stabilizing transformation as we have here, prior to dis-

cretizing the process. For the geometric Brownian motion process, for example,

the diļ¬usion-stabilizing transformation is a multiple of the logarithm, and this

transforms to a Brownian motion, for which the Euler approximation gives the

exact distribution.

194 CHAPTER 3. BASIC MONTE CARLO METHODS

Example: Down-and-out-Call.

Consider an asset whose price under the risk-neutral measure Q follows a con-

stant elasticity of variance (CEV) process

Ī³

(3.40)

dSt = rSt dt + ĻSt dWt

for a standard Brownian motion process Wt . A down-and-out call option with

exercise price K provides the usual payment (ST ā’ K)+ of a European call

option on maturity T if the asset never falls below a given out barrier b. The

parameter Ī³ > 0 governs the change in the diļ¬usion term as the asset price

changes. We wish to use simulation to price such an option with current asset

price S0 , time to maturity T , out barrier b < S0 and constant interest rate r

and compare with the Black-Scholes formula as b ā’ 0.

A geometric Brownian motion is most easily simulated by taking logarithms.

For example if St satisļ¬es the risk-neutral speciļ¬cation

(3.41)

dSt = rSt dt + ĻSt dWt

then Yt = log(St ) satisļ¬es

dYt = (r ā’ Ļ2 /2)dt + ĻdWt . (3.42)

This is a Brownian motion and is simulated with a normal random walk. In-

dependent normal increments are generated āYt ā¼ N ((r ā’ Ļ2 /2)āt, Ļ2 āt) and

their partial sums used to simulate the process Yt . The return for those options

that are in the money is the average of the values of (eYT ā’ E)+ over those

paths for which min{Ys ; t < s < T } ā„ ln(b). Similarly the transformation of

the CEV process which provides a constant diļ¬usion term is determined by

Zx

1

s(x) = dz

0 Ļ(z)

ā§

Zx āØ x1ā’Ī³ if Ī³ 6= 1

1ā’Ī³

z ā’Ī³ dz =

= .

ā© ln(x) if Ī³ = 1

0

SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.195

Assuming Ī³ 6= 1, the inverse function is

g(y) = cy 1/(1ā’Ī³)

1ā’Ī³

for constant c and the process Yt = (1 ā’ Ī³)ā’1 St satisļ¬es an Ito equation

with constant diļ¬usion coeļ¬cient;

r 1ā’Ī³ 1 Ī³ā’1

dYt = { St ā’ Ī³ĻSt }dt + dWt

Ļ 2

r Ī³Ļ

dYt = { (1 ā’ Ī³)Yt ā’ }dt + dWt . (3.43)

2(1 ā’ Ī³)Yt

Ļ

After simulating the process Yt we invert the relation to obtain St = ((1 ā’

Ī³)Yt )1/(1ā’Ī³) . There is one ļ¬ne point related to simulating the process (3.43)

that we implemented in the code below. The equation (3.40) is a model for

a non-negative asset price St but when we simulate the values Yt from (3.43)

there is nothing to prevent the process from going negative. Generally if Ī³ ā„ 1/2

and if we increment time in suļ¬ciently small steps āt, then it is unlikely that

a negative value of Yt will obtain, but when it does, we assume absorption at 0

(analogous to default or bankruptcy). The following Matlab function was used

to simulate sample paths from the CEV process over the interval [0, T ].

ńņš. 3 |