<<

ńņš. 3
(āńåćī 4)

ŃĪÄÅŠĘĄĶČÅ

>>

2Ļ€c c 2t 2 c
0
p
p K (α Γ 2 + (x − µ)2 )
αΓ 2 − β 2 + β(x − µ)) 1 p
= exp(Γ α .
Ļ€ Ī“ 2 + (x − µ)2
162 CHAPTER 3. BASIC MONTE CARLO METHODS

The modified Bessel function of the second kind Kλ (x) is given in MATLAB
by besselk( ν, x) and in R by besselK(x,ν,expon.scaled=FALSE). The distri-
bution with probability density function given by (3.22) is called the normal
inverse Gaussian distribution with real-valued parameters x, µ, 0 · Γ and
α ≄ |β|. The tails of the normal inverse Gaussian density are substantially
heavier than those of the normal distribution. In fact up to a constant

f (x; α, β, Ī“, µ) ∼ |x| −3/2 exp((∓α + β)x) as x → Ā±āˆž.

The moments of this distribution can be obtained from the moment gener-
ating function
Āø1/4
α2 − (β + s)2
M (s) = eµs exp{Γ(α2 −β 2 )1/2 −Γ(α2 −(s+β)2 )1/2 } for |β+s| < α.
α2 − β 2
(3.23)
These moments are:

E(X) = µ + Γβ(α2 − β 2 )−1/2

var(X) = Γα2 (α2 − β 2 )−3/2

and the skewness and kurtosis:

skew = 3βα−1 Γ −1/2 (α2 − β 2 )−1/4

kurtosis = 3Γ −1 α−2 (α2 + 4β 2 )(α2 − β 2 )−1/2 .




One of the particularly attractive features of this family of distributions,
shared by the normal and the stable family of distributions, is that it is closed
under convolutions. This is apparent from the moment generating function
(3.23) since
M N (s)

gives a moment generating function of the same form but with µ replaced by
µN and Γ by ΓN. In Figure 3.19 we plot the probability density function of a
member of this family of distributions.
GENERATING RANDOM NUMBERS FROM NON-UNIFORM CONTINUOUS DISTRIBUTIONS163




Figure 3.19: Normal Inverse Gaussian probability density function with α =
Γ = 1, β = 1 , µ = 0
2
164 CHAPTER 3. BASIC MONTE CARLO METHODS

Note the similarity to the normal density but with a modest amount of
skewness and increased weight in the tails. We can generate random variables
from this distribution as follows:

Sample T from an inverse Gaussian distribution (3.21)

Return X = µ + βT + N (0, T )
where N (0, T ) is a normal random variable with mean 0 and variance T.

We sample from the inverse Gaussian by using a property of the distribution
that if T has density of the form (3.21) then

(T − θ)2
(3.24)
c2 T

has a chi-squared distribution with one degree of freedom (easily generated
as the square of a standard normal random variable). The algorithm is (see
Michael, Shucany, Hass (1976));

1. For
1
c= p , and θ = Γc,
α2 − β 2
generate G1 from the Gamma( 1 , Γ ) distribution. Define
c
2
r
2
Y1 = 1 + G1 (1 − 1 + )}.
G1

1
2. Generate U2 ∼ U [0, 1]. If U2 · then output T = θY1
1+Y1


3. Otherwise output T = θY1−1 .

The two values θY1 , and θY1−1 are the two roots of the equation obtained by
setting (3.24) equal to a chi-squared variate with one degree of freedom and the
1
relative values of the probability density function at these two roots are 1+Y1
1
and 1 − 1+Y1 .

Finally to generate from the normal inverse Gaussian distribution (3.22) we
generate an inverse gamma random variable above and then set X = µ + βT +
GENERATING RANDOM NUMBERS FROM NON-UNIFORM CONTINUOUS DISTRIBUTIONS165




Figure 3.20: The Normal and the Normal inverse Gaussian fit to the S&P500
Returns



N (0, T ). Prause (1999) provides a statistical evidence that the Normal Inverse
Gaussian provides a better fit than does the normal itself. For example we fit the
normal inverse gamma distribution to the S&P500 index returns over the period
Jan 1, 1997-Sept 27, 2002. There were a total of 1442 values over this period.
Figure 3.20 shows a histogram of the daily returns together with the normal
and the NIG fit to the data. The mean return over this period is 8 × 10−5 and
the standard deviation of returns 0.013. If we fit the normal inverse Gaussian
distribution to these returns we obtain parameter estimates


α = 95.23, β = −4.72, Γ = 0.016, µ = 0.0009


and the Q-Q plots in Figure 3.21 . Both
166 CHAPTER 3. BASIC MONTE CARLO METHODS




Figure 3.21: QQ plots showing the Normal Inverse Gaussian and the Normal fit
to S&P 500 data, 1997-2002



indicate that the normal approximation fails to properly fit the tails of the
distribution but that the NIG distribution is a much better fit. This is similar to
the conclusion in Prause using observations on the Dow Jones Industrial Index.




Generating Random Numbers from Discrete Dis-
tributions

Many of the methods described above such as inversion and acceptance-rejection
for generating continuous distributions work as well for discrete random vari-
ables. Suppose for example X is a discrete distribution taking values on the
integers with probability function P [X = x] = f (x), for x = 0, 1, 2, ... Suppose
we can find a continuous random variable Y which has exactly the same value
of its cumulative distribution function at these integers so that FY (j) = FX (j)
for all j = 1, 2, .... Then we may generate the continuous random variable Y,
say by inversion or acceptance-rejection and then set X = bY c the integer part
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS167

of Y . Clearly X takes integer values and since P [X Ā· j] = P [Y Ā· j] = FX (j)
for all j = 0, 1, ..., then X has the desired distribution. The continuous ex-
ponential distribution and the geometric distribution are linked in this way. If
X has a geometric(p) distribution and Y has the exponential distribution with
parameter λ = − ln(1 − p), then X has the same distribution as dY e or bY c + 1.

Using the inverse transform method for generating discrete random vari-
ables is usually feasible but for random variables with a wide range of values
of reasonably high probability, it often requires some setup costs to achieve
reasonable efficiency. For example if X has cumulative distribution function
F (x), x = 0, 1, ...inversion requires that we output an integer X = F −1 (U ),an
integer X satisfying F (X − 1) < U · F (X). The most obvious technique for
finding such a value of X is to search sequentially through the potential values
x = 0, 1, 2, .... Figure 3.22 is the search tree for inversion for the distribution on
the integers 0, . . . 4 given by
0 1 2 3 4
x
0.11 0.30 0.25 0.21 0.13
f (x)
We generate an integer by repeatedly comparing a uniform [0,1] variate
U with the value at each node, taking the right branch if it is greater than
this threshold value, the left if it is smaller. If X takes positive integer values
{1, 2, ..., N },the number of values searched will average to E(X) which for many
discrete distributions can be unacceptably large.

An easy alternative is to begin the search at a value m which is near the
median (or mode or mean) of the distribution. For example we choose m = 2
and search to the left or right depending on the value of U in Figure 3.23.


If we assume for example that we root the tree at m then this results in
searching roughly an average of E[|X − m + 1|] before obtaining the generated
variable. This is often substantially smaller than E(X) especially when E(X)
is large but still unacceptably large when the distribution has large variance.
168 CHAPTER 3. BASIC MONTE CARLO METHODS



.11

0
.41

1
.66

2
.87

3 4


distribution: .11 .30 .25 .21 .13

Figure 3.22: Sequential Search tree for Inverse Transform with root at x = 0




.66




.87
.41


4
2 3
.11



0 1

Figure 3.23: Search tree rooted near the median
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS169


distribution .11 .30 .25 .21 .13




.45


.55
.24


2
.11 3
1



4
0


Figure 3.24: Optimal Binary Search Tree



An optimal binary search tree for this distribution is graphed in Figure
3.24. This tree has been constructed from the bottom up as follows. We begin
by joining the two smallest probabilities f (4) and f (0) to form a new node
with weight f (0) + f (4) = 0.24. Since we take the left path (towards X =
0 rather than towards X = 4) if U is smaller than the value .11 labelling
the node at the intersection of these two branches. We now regard this pair
of values as a unit and continue to work up the tree from the leaves to the
root. The next smallest pair of probabilities are {0, 1} and {3} which have
probabilities 0.24 and 0.21 respectively so these are the next to be joined hence
working from the leaves to the root of the tree. This optimal binary search
tree provides the minimum expected number of comparisons and is equivalent
to sorting the values in order of largest to smallest probability, in this case
1, 2, 3, 4, 0 , relabelling them or coding them {0, 1, 2, 3, 4} and then applying the
inverse transform method starting at 0.
The leaves of the tree are the individual probabilities f (j) and the internal
170 CHAPTER 3. BASIC MONTE CARLO METHODS

nodes are sums of the weights or probabilities of the ā€œchildrenā€, the values f (j)
for j on paths below this node. Let Di represents the depth of the i0 th leaf so for
example the depth of leaf 0 in Figure 3.24 is D0 = 3. Then the average number
of comparisons to generate a single random variable Xstarting at the root is
P
i f (i)Di . The procedure for constructing the last tree provides an optimal

algorithm in the sense that this quantity is minimized. It is possible to show that
an optimal binary search tree will reduce the average number of comparisons
from E(X) for ordinary inversion to less than 1 + 4 [log2 (1 + E(X))].
Another general method for producing variates from a discrete distribution
was suggested by Walker (1974, 1977) and is called the alias method. This is
based on the fact that every discrete distribution is a uniform mixture of two-
point distributions. Apart from the time required to set up an initial table of
aliases and aliasing probabilities, the time required to generate values from a dis-
crete distribution with K supporting points is bounded in K, whereas methods
such as inverse transform have computational time which increase proportion-
ally with E(X).
Consider a discrete distribution of the form with probability function f (j)
on K integers j = 1, 2, ...K. We seek a table of values of A(i) and associated
ā€œaliasā€ probabilities q(i) so that the desired discrete random variable can be
generated in two steps, first generate one of the integers {1, 2, ..., K} at random
and uniformly, then if we generated the value I, say, replace it by an ā€œaliasā€ value
A(I) with alias probability q(I). These values A(I) and q(I) are determined
below. The algorithm is:

GENERATE I UNIFORM ON {1, ...K}.

WITH PROBABILITY q(I), OUTPUT X = I, OTHERWISE, X = A(I).

An algorithm for producing these values of (A(i), q(i)), i = 1, ..., K} is sug-
gested by Walker(1977) and proceeds by reducing the number of non-zero prob-
abilities one at a time.
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS171

1. Put q(i) = Kf (i) for all i = 1, ..., K.

2. LET m be the index so that q(m) = min{q(i); q(i) > 0} and let q(M ) =
max{q(i); q(i) > 0}.

3. SET A(m) = M and fix q(m) ( it is no longer is subject to change).

4. Replace q(M ) by q(M ) − (1 − q(m))

5. Replace (q(1), ...q(K))by (q(1), ..., q(m − 1), q(m + 1), .q(M ) (so the com-
ponent with index m is removed).

6. Return to 2 unless all remaining qi = 1 or the vector of qi ’s is empty.


Note that on each iteration of the steps above, we fix one of components
q(m) and remove it from the vector and adjust one other, namely q(M ). Since
we always fix the smallest q(m) and since the average q(i) is one, we always
obtain a probability, i.e. fix a value 0 < q(m) · 1. Figure 3.25 shows the way
in which this algorithm proceeds for the distribution
1 2 3 4
x=
.1 .2 .3 .4
f (x) =
We begin with q(i) = 4 Ć— f (i) = .4, .8, 1.2, 1.6 for i = 1, 2, 3, 4. Then since
m = 1 and M = 4 these are the first to be adjusted. We assign A(1) = 4 and
q(1) = 0.4. Now since we have reassigned mass 1 − q(1) to M = 4 we replace
q(4) by 1.6 − (1 − 0.4) = 1. We now fix and remove q(1) and continue with
q(i) = .8, 1.2, 1.0 for i = 2, 3, 4. The next step results in fixing q(2) = 0.8,
A(2) = 3 and changing q(3) to q(3) − (1 − q(2)) = 1. After this iteration,
the remaining q(3), q(4) are both equal to 1, so according to step 6 we may
terminate the algorithm. Notice that we terminated without assigning a value
to A(3) and A(4). This assignment is unnecessary since the probability the alias
A(i) is used is (1 − q(i)) which is zero in these two cases. The algorithm
therefore results in aliases A(i) = 4, 3, i = 1, 2 and q(i) = .4, .8, 1, 1, respectively
for i = 1, 2, 3, 4. Geometrically, this method iteratively adjusts a probability
histogram to form a rectangle with base K as in Figure 3.25.
172 CHAPTER 3. BASIC MONTE CARLO METHODS

distribution aliased: .1 .2 .3 .4




1

A(2)
=3



A(1)=
4

1.6


1.6
1.2
.8
.4



4 3
1 2 3 1 2 4




Figure 3.25: The alias method for generating from the distribution 0.1 0.2 0.3
0.4



Suppose I now wish to generate random variables from this discrete distrib-
ution. We simply generate a random variable uniform on the set {1, 2, 3, 4} and
if 1 is selected, we replace it by A(1) = 4 with probability 1 − q(1) = 0.6. If 2
is selected it is replaced by A(2) = 3 with probability 1 − q(2) = 0.2.




Acceptance-Rejection for Discrete Random Variables


The acceptance-rejection algorithm can be used both for generating discrete and
continuous random variables and the geometric interpretation in both cases is
essentially the same. Suppose for example we wish to generate a discrete random
variable X having probability function f (x) using as a dominating function
a multiple of g(x) the probability density function of a continuous random
variable. Take for example the probability function
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS173

0.9



0.8



0.7



0.6



Reject
0.5



0.4



0.3



0.2
Accept
0.1



0
1 2 3 4
X




Figure 3.26: Acceptance-Rejection with for Discrete Distribution with continu-
ous dominating function.



1 2 3 4
x=
f (x) = .1 .3 .4 .2
using the dominating function 2g(x) = 0.1 + 0.2(x − 0.5) for 0.5 < x < 4.5. It
is easy to generate a continuous random variable from the probability density
function g(x) by inverse transform. Suppose we generate the value X. Then if
this value is under the probability histogram graphed in Figure 3.26 we accept
the value (after rounding it to the nearest integer to conform the discreteness
of the output distribution) and otherwise we reject and repeat.
We may also dominate a discrete distribution with another discrete distrib-
ution in which case the algorithm proceeds as in the continuous case but with
the probability density functions replaced by probability functions.


The Poisson Distribution.

Consider the probability function for a Poisson distribution with parameter Ī»

λx e−λ
(3.25)
f (x) = , x = 0, 1, ...
x!

The simplest generator is to use the Poisson process. Recall that a Poisson
process with rate 1 on the real line can be described in two equivalent ways:
174 CHAPTER 3. BASIC MONTE CARLO METHODS

1. Points are distributed on the line in such a way that the spacings be-
tween consecutive points are independent exponential(Ī») random vari-
ables. Then the resulting process is a Poisson process with rate Ī».

2. The number of points in an interval of length h has a Poisson (λh) distri-
bution. Moreover the numbers of points in non-overlapping intervals are
independent random variables.

The simplest generator stems from this equivalence. Suppose we use the
first specification to construct a Poisson process with rate parameter 1 and
then examine X = the number of points occurring in the interval [0, Ī»]. This
is the number of partial sums of exponential(1) random variables that are less
than or equal to Ī»
n+1
X
X = inf{n; (−lnUi ) > λ}
i=1

or equivalently
n+1
Y
Ui < e−λ } (3.26)
X = inf{n;
i=1

This generator requires CPU time which grows linearly with Ī» since the
number of exponential random variables generated and summed grows linearly
with Ī» and so an alternative for large Ī» is required. Various possibilities of
acceptance-rejection algorithms have been suggested including dominating the
Poisson probability function with multiples of the logistic probability density
function (Atkinson (1979)), the normal density with exponential right tail (cf.
Devroye, lemma 3.8, page 509). A simple all-purpose dominating function is the
so-called table-mountain function (cf. Stadlober (1989)), essentially a function
with a flat top and tails that decrease as 1/x2 . Another simple alternative for
generating Poisson variates that is less efficient but simpler to implement is to
use the Lorentzian, or truncated Cauchy distribution with probability density
function
c0
(3.27)
g(x|a, b) = ,x > 0
b2 + (x − a)2
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS175

where c0 is the normalizing constant. A random variable is generated from this
distribution using the inverse transform method; X = a + b tan(Ļ€U ),, where
U ∼ U [0, 1]. Provided that we match the modes of the distribution a = λ and
√
put b = 2Ī», this function may be used to dominate the Poisson distribution
and provide a simple rejection generator. The Matlab Poisson random number
generator is poissrnd (Ī», m, n) which generates an m Ć— n matrix of Poisson(Ī»)
variables. This uses the simple generator (3.26) and is not computationally
efficient for large values of λ.In R the command rpois(n,λ) generates a vector
of n Poisson variates.


The Binomial Distribution

For the Binomial distribution, we may use any one of the following alternatives:

Pn
I(Ui < p), Ui ∼ independent uniform[0, 1]
(1) X = i=1

Px+1
Gi > n}, where Gi ∼independent Geometric(p)
(2) X = inf{x; i=1

Px+1 Ei
> −log(1−p)}, where Ei ∼independent Exponential(1).
(3) X = inf{x; i=1 n−i+1


Method (1) obtains from the definition of the sum of independent Bernoulli
random variables since each of the random variables I(Ui < p) are independent,
have values 0 and 1 with probabilities 1 − p and p respectively. The event
(Ui < p) having probability p is typically referred to as a ā€œsuccessā€. Obviously
this method will be slow if n is large. For method (2), recall that the number of
trials necessary to obtain the first success, G1 , say, has a geometric distribution.
Similarly, G2 represents the number of additional trials to obtain the second
success. So if X = j, the number of trials required to obtain j + 1 successes
was greater than n and to obtain j successes, less than or equal to n. In other
words there were exactly j successes in the first n trials. When n is large but np
fairly small, method (2) is more efficient since it is proportional to the number
os successes rather than the total number of trials. Of course for large n and
176 CHAPTER 3. BASIC MONTE CARLO METHODS

np sufficiently small (e.g. <1), we can also replace the Binomial distribution
by its Poisson (λ = np) approximation. Method (3) is clearly more efficient if
−log(1 − p) is not too large so that p is not too close to 1, because in this case
we need to add fewer exponential random variables.

For large mean np and small n(1 − p) we can simply reverse the role of
successes and failures and use method (2) or (3) above. But if both np and
n(1 − p) are large, a rejection method is required. Again we may use rejection
p
beginning with a Lorentzian distribution, choosing a = np, and b = 2np(1 − p)
in the case p < 1/2. When p > 1/2, we simply reverse the roles of ā€œfailuresā€
and ā€œsuccessesā€. Alternatively, a dominating table-mountain function may be
used (Stadlober (1989)). The binomial generator in Matlab is the function
binornd(n,p,j,k) which generates an n Ć— k matrix of binomial(n, p) random
variables. This uses the simplest form (1) of the binomial generator and is not
computationally efficient for large n. In R, rbinom(m,n,p) will generate a vector
of length m of Binomial(n, p) variates.




Random Samples Associated with Markov Chains



Consider a finite state Markov Chain, a sequence of (discrete) random variables
X1 , X2 , . . .each of which takes integer values 1, 2, . . . N (called states). The
number of states of a Markov chain may be large or even infinite and it is not
always convenient to label them with the positive integers and so it is common
to define the state space as the set of all possible states of a Markov chain, but
we will give some examples of this later. For the present we restrict attention to
the case of a finite state space. The transition probability matrix is a matrix P
describing the conditional probability of moving between possible states of the
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 177

chain, so that

P [Xn+1 = j|Xn = i] = Pij , i = 1, . . . N, j = 1, . . . N.

P
where Pij ≄ 0 for all i, j and for all i. A limiting distribution
Pij = 1
j

of a Markov chain is a vector (Ļ€ say) of long run probabilities of the individual
states with the property that

Ļ€i = limtā†’āˆž P [Xt = i].

A stationary distribution of a Markov chain is the column vector (Ļ€ say) of
probabilities of the individual states such that

π0 P = π0 . (3.28)

π 0 P = π0 . For a Markov chain, every limiting distribution is in fact a station-
ary distribution. For the basic theory of Markov Chains, see the Appendix.
Roughly, a Markov chain which eventually ā€œforgetsā€ the states that were occu-
pied in the distant path, in other words for which the probability of the current
states does not vary much as we condition on different states in the distant
past, is called ergodic. A Markov chain which simply cycles through three
states 1 → 2 → 3 → 1 → ... is an example of a periodic chain, and is not
ergodic.
It is often the case that we wish to simulate from a finite ergodic Markov
chain when it has reached equilibrium or stationarity, which is equivalent to
sampling from the distribution of Xn assuming that the distribution of X0
is given by the stationary distribution π. In a few cases, we can obtain this
stationary distribution directly from (3.28) but when N is large this system of
equations is usually not feasible to solve and we need to find another way to
sample from the probability vector π. Of course we can always begin the Markov
chain in some arbitrary initial state and run it waiting for Hele to freeze over (it
does happen since Helle is in Devon) until we are quite sure that the chain has
178 CHAPTER 3. BASIC MONTE CARLO METHODS

essentially reachedequilibrium, and then use a subsequent portion of this chain,
discarding this initial period, sometimes referred to as the ā€œinitial transientā€.
Clearly this is often not a very efficient method, particularly in cases in
which the chain mixes or forgets its past very slowly for in this case the required
initial transient is long. On the other hand if we shortened it, we run the risk of
introducing bias into our simulations because the distribution generated is too
far from the equilibrium distribution π. There are a number of solutions to this
problem proposed in a burgeoning literature. Here we limit ourselves to a few
of the simpler methods.


Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm is a method for generating random variables
from a distribution π that applies even in the case of an infinite number of
states or a continuous distribution π. It is assumed that π is known up to some
multiplicative constant. Roughly, the method consists of using a convenient
ā€œproposalā€ Markov chain with transition matrix Q to generate transitions, but
then only ā€œacceptā€ the move to these new states with probability that depends
on the distribution π. The idea resembles that behind importance sampling.
The basic result on which the Metropolis-Hastings algorithm is pinned is the
following theorem.

Theorem 31 Suppose Qij is the transition matrix of a Markov chain. Assume
PN
that g is a vector of non-negative values such that i=1 gi = G and
gj
| | Ā· K < āˆž for all i, j
Qij
for some finite value K. Define
gj Qji
ρij = min(1, )
gi Qij
Then the Markov Chain with transition probability matrix

Pij = Qij ρij , for i 6= j (3.29)
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 179

gi
has stationary distribution πi = G.


Proof. The proof consists of showing that the so-called ā€œdetailed balance
gi
conditionā€ is satisfied, i.e. with Ļ€i = that
G,


πi Pij = πj Pji , for all i, j. (3.30)

This condition implies that when the chain is operating in equilibrium,

P [Xn = i, Xn+1 = j] = P [Xn = j, Xn+1 = i]

reflecting a cavalier attitude to the direction in which time flows or reversibility
of the chain. Of course (3.30) is true automatically if i = j and for i 6= j,

gi gj Qji
Ļ€i Pij = Qij min(1, )
G gi Qij
1
= min(gi Qij , gj Qji )
G
= πj Pji

1
by the symmetry of the function min(gi Qij , gj Qji ). Now the detailed balance
G

condition (3.30) implies that π is a stationary distribution for this Markov chain
since
N N
X X
πi Pij = πj Pji
i=1 i=1
N
X
= πj Pji
i=1

= πj for each j = 1, ..., N.



Provided that we are able to generate transitions for the Markov Chain with
transition matrix Q, it is easy to generate a chain with transition matrix P in
(3.29). If we are currently in state i, generate the next state with probability
Qij . If j = i then we stay in state i. If j 6= i, then we ā€œacceptā€ the move to state
j with probability ρij , otherwise we stay in state i. Notice that the Markov
180 CHAPTER 3. BASIC MONTE CARLO METHODS

Chain with transition matrix P tends to favour moves which increase the value
of π. For example if the proposal chain is as likely to jump from i to j as it
is to jump back so that Qij = Qji , then if πj > πi the move to j is always
Ļ€j
accepted whereas if πj < πi the move is only accepted with probability The
Ļ€i .

assumption Qij = Qji is a common and natural one, since in applications of
the Metropolis-Hastings algorithm, it is common to choose j ā€œat randomā€ (i.e.
uniformly distributed) from a suitable neighborhood of i.
The above proof only provides that π is a stationary distribution of the
Markov Chain associated with P, not that it is necessarily the limiting distrib-
ution of this Markov chain. For this to follow we need to know that the chain
is ergodic. Various conditions for ergodicity are given in the literature. See for
example Robert and Casella (1999, Chapter 6) for more detail.


Gibbs Sampling

There is one simple special case of the Metropolis-Hastings algorithm that is
particularly simple, common and compelling. To keep the discussion simple,
suppose the possible states of our Markov Chain are points in two-dimensional
space (x, y). We may assume both components are discrete or continuous. Sup-
pose we wish to generate observations from a stationary distribution which is
proportional to g(x, y) so
g(x, y)
Ļ€(x, y) = P P (3.31)
y g(x, y)
x

defined on this space but that the form of the distribution is such that directly
generating from this distribution is difficult, perhaps because it is difficult to
obtain the denominator of (3.31). However there are many circumstances where
it is much easier to obtain the value of the conditional distributions
Ļ€(x, y)
Ļ€(x|y) = P and
z π(z, y)
Ļ€(x, y)
Ļ€(y|x) = P
z π(x, z)
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 181

Now consider the following algorithm: begin with an arbitrary value of y0
and generated x1 from the distribution π(x|y0 ) followed by generating y1 from
the distribution π(y|x1 ). It is hard to imagine a universe in which iteratively
generating values xn+1 from the distribution π(x|yn ) and then yn+1 from the
distribution Ļ€(y|xn+1 ) does not, at least asymptotically as n → āˆž, eventually
lead to a draw from the joint distribution π(x, y). Indeed that is the case since
the transition probabilities for this chain are given by


P (xn+1 , yn+1 |xn , yn ) = π(xn+1 |yn )π(yn+1 |xn+1 )


and it is easy to show directly from these transition probabilities that

X
P (x1 , y1 |x, y)Ļ€(x, y)
(x,y)
X X
= π(y1 |x1 ) π(x1 |y) π(x, y)
y x
X
= π(y1 |x1 ) π(x1 , y)
y

= π(x1 , y1 ).


Of course the real power of Gibbs Sampling is achieved in problems that are
not two-dimensional such as the example above, but have dimension sufficiently
high that calculating the sums or integrals in the denominator of expressions
like (3.31) is not computationally feasible.



Coupling From the Past: Sampling from the stationary dis-
tribution of a Markov Chain

All of the above methods assume that we generate from the stationary distri-
bution of a Markov chain by the ā€œuntil Hele freezes overā€ method, i.e. wait
until run the chain from an arbitrary starting value and then delete the initial
transient. An alternative elegant method that is feasible at least for some finite
182 CHAPTER 3. BASIC MONTE CARLO METHODS

state Markov chains is the method of ā€œcoupling from the pastā€ due to Propp
and Wilson (1996).
We assume that we are able to generate transitions in the Markov Chain. In
other words if the chain is presently in state i at time n we are able to generate
a random variable Xn+1 from the distribution proportional to Pij , j = 1, ...K.
Suppose F (x|i) is the cumulative distribution function P (Xn+1 Ā· x|Xn = i)
and let us denote its inverse by F −1 (y|i). So if we wish to generate a random
variable Xn+1 conditional on Xn , we can use the inverse transform Xn+1 =
F −1 (Un+1 |Xn ) applied to the Uniform[0,1] random variable Un+1 . Notice that
a starting value say X−100 together with the sequence of uniform[0,1] variables
(U−99 , ..., U0 ) determines the chain completely over the period −100 · t · 0.
If we wish to generated the value of Xt given Xs , s < t, then we can work this
expression backwards

Xt = F −1 (Ut−1 |Xt−1 )

= F −1 (Ut−1 |F −1 (Ut−2 |Xt−2 ))

= F −1 (Ut−1 |F −1 (Ut−2 |...F −1 (Ut−1 |F −1 (Us |i))))
t
= Fs (Xs ), say.

Now imagine an infinite sequence {Ut , t = ..., −3, −2, −1} of independent uni-
form[0,1] random variables that was used to generate the state X0 of a chain
at time 0. Let us imagine for the moment that there is a value of M such that
0
F−M (i) is a constant function of i. This means that for this particular draw
of uniform random numbers, whatever the state i of the system at time −M,
0
the same state X0 = F−M (i) is generated to time 0. All chains, possibly with
different behaviour prior to time −M are ā€coupledā€ at time −M and identical
from then on. In this case we say that coalescence has occurred in the interval
[−M, 0]. No matter where we start the chain at time −M it ends up in the
same state at time 0, so it is quite unnecessary to simulate the chain over the
whole infinite time interval āˆ’āˆž < t Ā· 0. No matter what state is occupied
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 183

at time t = −M, the chain ends up in the same state at time t = 0. When
coalescence has occurred, we can safely consider the common value of the chain
at time 0 to be generated from the stationary distribution since it is exactly
the same value as if we had run the chain from t = āˆ’āˆž.

There is sometimes an easy way to check whether coalescence has occurred
in an interval, if the state space of the Markov chain is suitably ordered. For
example suppose the states are numbered 1, 2, ..., N. Then it is sometimes
possible to relabel the states so that the conditional distribution functions F (x|i)
are stochastically ordered, or equivalently that F −1 (U |i) is monotonic (say
monotonically increasing) in i for each value of U. This is the case for example
Pj
provided that the partial sums l=1 Pil are increasing functions of i for each
0
j = 1, 2, ..., N. If follows that the functions F−M (i) are all monotonic functions
of i and so

0 0 0
F−M (1) · F−M (2) · ...F−M (N ).

0 0 0
Therefore, if F−M (1) = F−M (N ), then F−M (i) must be a constant function.
Notice also that if there is any time in an interval [s, t] at which coalescence
t
occurs so that Fs (i) is a constant function of i, then for any interval [S, T ]
T
containing it [S, T ] ⊃ [s, t], FS (i) is also a constant function of i.

It is easy to prove that coalescence occurs in the interval [−M, 0] for suf-
ficiently large M. For an ergodic finite Markov chain, there is some step size
τ such that every transition has positive probability P [Xt+τ = j|Xt = i] > ²
for all i, j. Consider two independent chains, one beginning in state i and the
other in state i0 at time t = 0. Then the probability that they occupy the same
state j at time t = τ is at least ²2 . It is easy to see that if we use inverse
transform to generate the transitions and if they are driven by common random
numbers then this can only increase the probability of being in the same state,
so the probability these two chains are coupled at time τ is at least ²2 . Similarly
for N possible states, the probability of coalescence in an interval of length Ļ„
184 CHAPTER 3. BASIC MONTE CARLO METHODS

is at least εN > 0. Since there are infinitely many intervals disjoint of length
Ļ„ in [āˆ’āˆž, 0] and the events that there is a coalescence in each interval are
independent, the probability that coalescence occurs somewhere in [āˆ’āˆž, 0] is
1.
We now detail the Propp Wilson algorithm


1. Set M = 1, XU = N, XL = 1

2. Generate U−M ....U−M/2+1 all independent U nif orm[0, 1].

3. For t = −M to −1 repeat


(a) obtain XL = F −1 (Ut−1 |XL ) and XU = F −1 (Ut−1 |XU ).

(b) If XL = XU stop and output X(0) = XL


4. Otherwise, set M = 2M and go to step 2.



This algorithm tests for coalescence repeatedly by starting on the intervals


[−1, 0], [−2, −1], [−4, −2], [−8, −4].


We are assured that with probability one, the process will terminate with co-
alescence after a finite number of steps. Moreover, in this algorithm that the
random variable Ut once generated is NOT generated again on a subsequent
pass when M is doubled. The generated Ut is reused at each pass until coales-
cence occurs. If Ut were regenerated on subseuqent passes, this would lead to
bias in the algorithm.
It may well be that this algorithm needs to run for a very long time before
achieving coalescence and an impatient observer who interrupts the algorithm
prior to coalescence and starts over will bias the results. Varous modifications
have been made to speed up the algorithm (e.g. Fill, 1998).
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 185




Sampling from the Stationary Distribution of a Diffusion
Process

A basic Ito process of the form

dXt = a(Xt )dt + σ(Xt )dWt

is perhaps the simplest extension of a Markov chain to continuous time, contin-
uous state-space. It is well-known that under fairly simple conditions, there is
a unique (strong) solution to this equation and that the limiting distribution of
XT as T → āˆž has stationary distribution with probability density function
Z x
1 a(z)
f (x) = c 2 exp{2 dz}
σ 2 (z)
σ (x) 0

where the constant c is chosen so that the integral of the density is 1. To be
able to do this we need to assume that
Z Z
āˆž x
1 a(z)
dz}dx < āˆž. (3.32)
exp{2
σ 2 (x) σ 2 (z)
āˆ’āˆž 0

In order to generate from this stationary distribution, we can now start the
process at some arbitrary value X0 and run it for a very long time T , hoping
that this is sufficiently long that the process is essentially in its stationary
state, or try to generate X0 more directly from (3.32) in which case the process
is beginning (and subsequently running) with its stationary distribution.
For an example, let us return to the CIR process

1/2
dXt = k(b − Xt )dt + σXt (3.33)
dWt .

In this case

a(x) = k(b − x), for x > 0,

σ 2 (x) = σ2 x, for x > 0.
186 CHAPTER 3. BASIC MONTE CARLO METHODS

Notice that
Z x
k(b − z)
1 1 2kb k
dz} = 2 x−1 exp{ 2 ln(x/ε) − 2 (x − ε)}
exp{2
σ2x 2z
σ σ σ σ
ε

is proportional to
2
x2kb/σ −1
exp{−kx/σ 2 }

and the integral of this function, a Gamma function, will fail to converge unless
2kb/σ 2 − 1 > −1 or 2kb > σ 2 . Under this condition the stationary distribution
2
of the CIR process is Gamma(2kb/σ2 , σ ). If this condition fails and 2kb < σ 2 ,
k

then the process Xt is absorbed at 0. If we wished to simulate a CIR process in
equilibrium, we should generate starting values of X0 from the Gamma distrib-
ution. More generally for a CEV process satisfying

γ/2
dXt = k(b − Xt )dt + σXt (3.34)
dWt

a similar calculation shows that the stationary density is proportional to

2kb 1 k
x−γ exp{− − 2 xγ }, for γ > 1.
σ 2 xγ−1 (γ − 1) σ γ


Simulating Stochastic Partial Differential Equa-
tions.

Consider a derivative product whose underlying asset has price Xt which follows
some model. Suppose the derivative pays an amount V0 (XT ) on the maturity
date T. Suppose that the value of the derivative depends only on the current time
t and the current value of the asset S, then its current value is the discounted
future payoff, an expectation of the form
Z T
(3.35)
V (S, t) = E[V0 (XT )exp{− r(Xv , v)dv}|Xt = S]
t

where r(Xt , t) is the current spot interest rate at time t. In most cases, this ex-
pectation is impossible to evaluate analytically and so we need to resort to
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.187

numerical methods. If the spot interest rate is function of both arguments
(Xv , v) and not just a function of time, then this integral is over the whole
joint distribution of the process Xv , 0 < v < T and simple one-dimensional
methods of numerical integration do not suffice. In such cases, we will usu-
ally resort to a Monte-Carlo method. The simplest version requires simulating
a number of sample paths for the process Xv starting at Xt = S, evaluating
RT
V0 (XT )exp{− t r(Xv , v)dv} and averaging the results over all simulations. We
begin by discussing the simulation of the process Xv required for integrations
such as this.
Many of the stochastic models in finance reduce to simple diffusion equation
(which may have more than one factor or dimension). Most of the models
in finance are Markovian in the sense that at any point t in time, the future
evolution of the process depends only on the current state Xt and not on the
past behaviour of the process Xs , s < t. Consequently we restrict to a ā€œMarkov
diffusion modelā€ of the form

(3.36)
dXt = a(Xt , t)dt + σ(Xt , t)dWt

with some initial value X0 for Xt at t = 0. Here Wt is a driving standard Brown-
ian motion process. Solving deterministic differential equations can sometimes
provide a solution to a specific problem such as finding the arbitrage-free price of
a derivative. In general, for more complex features of the derivative such as the
distribution of return, important for considerations such as the Value at Risk,
we need to obtain a solution {Xt , 0 < t < T }to an equation of the above form
which is a stochastic process. Typically this can only be done by simulation.
One of the simplest methods of simulating such a process is motivated through
a crude interpretation of the above equation in terms of discrete time steps, that
is that a small increment Xt+h − Xt in the process is approximately normally
distributed with mean given by a(Xt , t)hand variance given by σ 2 (Xt , t)h. We
generate these increments sequentially, beginning with an assumed value for
188 CHAPTER 3. BASIC MONTE CARLO METHODS

X0 , and then adding to obtain an approximation to the value of the process
at discrete times t = 0, h, 2h, 3h, . . .. Between these discrete points, we can
linearly interpolate the values. Approximating the process by assuming that
the conditional distribution of Xt+h − Xt is N (a(Xt , t)h, σ 2 (Xt , t)h) is called
Euler’s method by analogy to a simple method by the same name for solving or-
dinary differential equations. Given simulations of the process satisfying (3.36)
together with some initial conditions, we might average the returns on a given
derivative for many such simulations, (provided the process is expressed with
respect to the risk-neutral distribution), to arrive at an arbitrage-free return for
the derivative.
In this section we will discuss the numerical solution, or simulation of the
solution to stochastic differential equations.
Letting ti = iāˆ†x, Equation (3.36) in integral form implies
Z ti+1 Z ti+1
(3.37)
Xti+1 = Xti + a(Xs , s)ds + σ(Xs , s)dWs
ti ti

For the following lemma we need to introduce Op or ā€œorder in probabilityā€,
notation common in mathematics and probability. A sequence indexed by āˆ†t,
say Yāˆ†t = Op (āˆ†t)k means that when we divide this term by (āˆ†t)k and then
let āˆ†t → 0, the resulting sequence is bounded in probability or that for each ε
there exists K < āˆž so that
Yāˆ†t
| > K] < ε
P [|
āˆ†tk
whenever |āˆ†t| < ε. As an example, if W is a Brownian motion, then āˆ†Wt =
W (t+āˆ†t)−W (t) has a Normal distribution with mean 0 and standard deviation
√
āˆ†t and is therefore Op (āˆ†t)1/2 . Similarly Then we have two very common
and useful approximations to a diffusion given by the following lemma.

Lemma 32 If Xt satisfies a diffusion equation of the form (3.36) then

Xti+1 = Xti + a(Xti , ti )āˆ†t + σ(Xti , ti )āˆ†Wt + Op (āˆ†t) (Euler approximation)
∂
σ(Xti , ti ) ∂x σ(Xti , ti )
[(āˆ†Wt )2 − āˆ†t] + Op (āˆ†t)3/2 (Milstei
Xti+1 = Xti + a(Xti , ti )āˆ†t + σ(Xti , ti )āˆ†Wt +
2
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.189




Proof. Ito’s lemma can be written in terms of two operators on functions
f for which the derivatives below exist;

df (Xt , t) = Lo f dt + L1 f dWt where


1 2 ∂2
∂ ∂
0
+ , and
L =a +σ 2
∂x 2 ∂x ∂t
∂
L1 = σ .
∂x

Integrating, this and applying to twice differentiable functions a and σ and
s > ti ,
Z Z
s s
0
L1 a(Xu , u)dWu
a(Xs , s) = a(Xti , ti ) + L a(Xu , u)du +
t t
Z is Z is
L0 σ(Xu , u)du + L1 σ(Xu , u)dWu .
σ(Xs , s) = σ(Xti , ti ) +
ti ti

By substituting in each of the integrands in 3.37 using the above identity and
iterating this process we arrive at the Ito-Taylor expansions (e.g. Kloeden and
Platen, 1992). For example,
Z ti+1 Z ti+1 Z Z
s s
0
L1 a(Xu , u)dWu }ds
{a(Xti , ti ) +
a(Xs , s)ds = L a(Xu , u)du +
ti ti ti ti
Z ti+1 Z Z Z
s ti+1 s
0 1
ā‰ˆ a(Xti , ti )āˆ†t + L a(Xti , ti ) duds + L a(Xti , ti ) dWu ds
ti ti ti ti

The first term a(Xti , ti )āˆ†t, is an initial approximation to the desired inte-
gral and the rest is a lower order correction that we may regard as an er-
ror term for the moment. For example it is easy to see that the second term
Rt Rs Rt Rs
L0 a(Xti , ti ) tii+1 ti duds is Op (āˆ†t)2 because the integral tii+1 ti duds = (āˆ†t)2 /2
Rt Rs
and L0 a(Xti , ti ) is bounded in probability. The third term L1 a(Xti , ti ) tii+1 ti dWu ds
Rt Rs Rt
is Op (āˆ†t)3/2 since tii+1 ti dWu ds = tii+1 (ti+1 − u)dWu and this is a normal
Rt
random variable with mean 0 and variance tii+1 (ti+1 − u)2 du = (āˆ†t)3 /3. We
can write such a normal random variable as 3−1/2 (āˆ†t)3/2 Z for Z a standard
190 CHAPTER 3. BASIC MONTE CARLO METHODS

normal random variable and so this is obviously Op (āˆ†t)3/2 . Thus the simplest
Euler approximation to the distribution of the increment assumes that āˆ†X has
conditional mean a(Xti , ti )āˆ†t. Similarly
Z ti+1 Z ti+1 Z Z
s s
0
L1 σ(Xu , u)dWu }dWs
{σ(Xti , ti ) +
σ(Xs , s)dWs = L σ(Xu , u)du +
ti ti ti ti
Z ti+1 Z Z Z
s ti+1 s
0 1
ā‰ˆ σ(Xti , ti )āˆ†Wt + L σ(Xti , ti ) dudWs + L σ(Xti , ti ) dWu dWs
ti ti ti ti
∂
σ(Xti , ti ) ∂x σ(Xti , ti )
[(āˆ†Wt )2 − āˆ†t] + Op (āˆ†t)3/2
= σ(Xti , ti )āˆ†Wt +
2
R ti+1 R s
dWu dWs = 1 [(āˆ†Wt )2 − āˆ†t], L0 σ(Xu , u) = σ(Xti , ti )+ Op (āˆ†t)1/2 ,
since 2
ti ti
Rt Rs
∂
L1 σ(Xu , u) = σ(Xu , u) ∂x σ(Xu , u) and tii+1 ti dudWs = Op (āˆ†t)3/2 . Putting
these terms together, we arrive at an approximation to the increment of the
form
∂
σ(Xti , ti ) ∂x σ(Xti , ti )
[(āˆ†Wt )2 āˆ’āˆ†t]+Op (āˆ†t)3/2
āˆ†Xt = a(Xti , ti )āˆ†t+σ(Xti , ti )āˆ†Wt +
2
(3.38)
which allow an explicit representation of the increment in the process X in terms
of the increment of a Brownian motion process āˆ†Wt ∼ N (0, āˆ†t).
The approximation (3.38) is called the Milstein approximation, a refinement
of the first, the Euler approximation. It is the second Ito-Taylor approximation
to a diffusion process. Obviously, the increments of the process are quadratic
functions of a normal random variable and are no longer normal. The error
approaches 0 at the rate Op (āˆ†t)3/2 in probability only. This does not mean
that the trajectory is approximated to this order but that the difference between
the Milstein approximation to a diffusion and the diffusion itself is bounded in
probability when divided by (āˆ†t)3/2 and as we let āˆ†t → 0. Higher order Taylor
approximations are also possible, although they grow excessively complicated
very quickly. See the book by Kloeden and Platten(1992) for details.
There remains the question of how much difference it makes which of these
approximations we employ for a particular diffusion. Certainly there is no dif-
ference at all between the two approximations in the case that the diffusion
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.191

coefficient σ(Xt , t) does not depend at all on Xt . In general, the difference is
hard to assess but in particular cases we can at least compare the performance of
the two methods. The approximations turn out to be very close in most simple
cases. For example consider the stock price path in Figure 3.27. The dashed
line corresponds to a Milstein approximation whereas the piecewise continuous
line corresponds to the Euler approximation. In this case the Milstein appears
to be a little better, but if I run a number of simulations and compare the sum
of the squared errors (i.e. squared differences between the approximate value
of Xt and the true value of Xt ) we find that the improvement is only about
two percent of the difference. The same is true even if I change the value of
āˆ†t from 1/12 (i.e. one month) to 1/52 (i.e. one week). Unlike the behaviour
of higher order approximations to deterministic functions, there appears to be
little advantage in using a higher order approximation, at least in the case of
diffusions with smooth drift and diffusion coefficients.

We can compare using Milstein approximation on the original process and
using Euler’s approximation on a transformation of the process in the case that
the diffusion term depends only on the state of the process (not time). In other
words, suppose we have an Ito process of the form



(3.39)
dXt = a(Xt , t)dt + σ(Xt )dWt


where Wt is an ordinary Wiener measure. A simple transformation reduces this
to a problem with constant diffusion term. Suppose σ(x) > 0 for all x and let
Z x
1
dz, for x ≄ 0
s(x) =
σ(z)
0
Z 0
1
s(x) = − dz for x < 0
σ(z)
x


where we assume these integrals are well defined. Let g be the inverse function of
s. This inverse exists since the function is continuous monotonically increasing.
192 CHAPTER 3. BASIC MONTE CARLO METHODS




Figure 3.27: Comparison of Milstein and Euler approximation to stock with
āˆ†t = 1/12 year.
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.193

Suppose we apply Ito’s lemma to the transformed process Yt = s(Xt ). We obtain
1
dYt = {a(Xt , t)s0 (Xt ) + σ 2 (Xt )s00 (Xt )}dt + σ(Xt )s0 (Xt )dWt
2
σ0 (Xt )
a(Xt , t) 1 2
={ − σ (Xt ) 2 }dt + dWt
σ(Xt ) 2 σ (Xt )
= µ(Yt , t)dt + dWt

where
a(g(Yt ), t) 1 0
− σ (g(Yt )).
µ(Yt , t) =
σ(g(Yt )) 2
In other words, Y t satisfies an Ito equation with constant diffusion term.
Suppose we generate an increment in Yt using Euler’s method and then solve
for the corresponding increment in Xt .Then using the first two terms in the
Taylor series expansion of g,
1
āˆ†Xt = g 0 (Yt )āˆ†Yt + g 00 (Yt )(āˆ†Yt )2
2
1
= g 0 (Yt )(µ(Yti , ti )āˆ†t + āˆ†Wt ) + σ 0 (g(Yt ))σ(g(Yt ))(āˆ†Yt )2
2
1 1
= {a(g(Yt ), t) − σ(g(Yt ))σ 0 (g(Yt ))}āˆ†t + σ(g(Yt ))āˆ†Wt + σ 0 (g(Yt ))σ(g(Yt ))(āˆ†Yt )2
2 2
since
1
g 0 (Yt ) = = σ(g(Yt )) and
s0 (g(Y t ))

g 00 (Yt ) = σ0 (g(Yt ))σ(g(Yt )).

But since (āˆ†Yt )2 = (āˆ†Wt )2 + o(āˆ†t) it follows that
1 1
āˆ†Xt = {a(Xt , t)− σ(Xt )σ 0 (Xt )}āˆ†t+σ(Xt )āˆ†Wt + σ 0 (Xt )σ(Xt )(āˆ†Wt )2 +o(āˆ†t)
2 2
and so the approximation to this increment is identical, up to the order con-
sidered, to the Milstein approximation. For most processes, it is preferable
to apply a diffusion stabilizing transformation as we have here, prior to dis-
cretizing the process. For the geometric Brownian motion process, for example,
the diffusion-stabilizing transformation is a multiple of the logarithm, and this
transforms to a Brownian motion, for which the Euler approximation gives the
exact distribution.
194 CHAPTER 3. BASIC MONTE CARLO METHODS

Example: Down-and-out-Call.

Consider an asset whose price under the risk-neutral measure Q follows a con-
stant elasticity of variance (CEV) process

γ
(3.40)
dSt = rSt dt + σSt dWt

for a standard Brownian motion process Wt . A down-and-out call option with
exercise price K provides the usual payment (ST − K)+ of a European call
option on maturity T if the asset never falls below a given out barrier b. The
parameter γ > 0 governs the change in the diffusion term as the asset price
changes. We wish to use simulation to price such an option with current asset
price S0 , time to maturity T , out barrier b < S0 and constant interest rate r
and compare with the Black-Scholes formula as b → 0.
A geometric Brownian motion is most easily simulated by taking logarithms.
For example if St satisfies the risk-neutral specification

(3.41)
dSt = rSt dt + σSt dWt

then Yt = log(St ) satisfies

dYt = (r − σ2 /2)dt + σdWt . (3.42)

This is a Brownian motion and is simulated with a normal random walk. In-
dependent normal increments are generated āˆ†Yt ∼ N ((r − σ2 /2)āˆ†t, σ2 āˆ†t) and
their partial sums used to simulate the process Yt . The return for those options
that are in the money is the average of the values of (eYT − E)+ over those
paths for which min{Ys ; t < s < T } ≄ ln(b). Similarly the transformation of
the CEV process which provides a constant diffusion term is determined by
Zx
1
s(x) = dz
0 σ(z)
āŽ§
Zx āŽØ x1−γ if γ 6= 1
1−γ
z −γ dz =
= .
āŽ© ln(x) if γ = 1
0
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.195

Assuming γ 6= 1, the inverse function is

g(y) = cy 1/(1−γ)

1−γ
for constant c and the process Yt = (1 − γ)−1 St satisfies an Ito equation
with constant diffusion coefficient;
r 1−γ 1 γ−1
dYt = { St − γσSt }dt + dWt
σ 2
r γσ
dYt = { (1 − γ)Yt − }dt + dWt . (3.43)
2(1 − γ)Yt
σ
After simulating the process Yt we invert the relation to obtain St = ((1 −
γ)Yt )1/(1−γ) . There is one fine point related to simulating the process (3.43)
that we implemented in the code below. The equation (3.40) is a model for
a non-negative asset price St but when we simulate the values Yt from (3.43)
there is nothing to prevent the process from going negative. Generally if γ ≄ 1/2
and if we increment time in sufficiently small steps āˆ†t, then it is unlikely that
a negative value of Yt will obtain, but when it does, we assume absorption at 0
(analogous to default or bankruptcy). The following Matlab function was used
to simulate sample paths from the CEV process over the interval [0, T ].

<<

ńņš. 3
(āńåćī 4)

ŃĪÄÅŠĘĄĶČÅ

>>