<< стр. 6(всего 11)СОДЕРЖАНИЕ >>
Zв€ћ
Оё Оё 1 t 1
tв€’2 exp(в€’ ((x в€’ Вµ)2 + Оё2 ) в€’ (ОІ 2 + 2 ))dt
exp(ОІ(x в€’ Вµ) + 2 )
=
2ПЂc c 2t 2 c
0
p
p K1 (О± Оґ 2 + (x в€’ Вµ)2 )
О±Оґ
exp(Оґ О±2 в€’ ОІ 2 + ОІ(x в€’ Вµ)) p
= .
ПЂ Оґ 2 + (x в€’ Вµ)2

The modiп¬Ѓed Bessel function of the second kind KО» (x) is given in MATLAB
by besselk( ОЅ, x) and in R by besselK(x,ОЅ,expon.scaled=FALSE). The distri-
bution with probability density function given by (3.22) is called the normal
inverse Gaussian distribution with real-valued parameters x, Вµ, 0 В· Оґ and
О± в‰Ґ |ОІ|. The tails of the normal inverse Gaussian density are substantially
heavier than those of the normal distribution. In fact up to a constant

f (x; О±, ОІ, Оґ, Вµ) в€ј |x| в€’3/2 exp((в€“О± + ОІ)x) as x в†’ В±в€ћ.

The moments of this distribution can be obtained from the moment gener-
ating function
Вё1/4
О±2 в€’ (ОІ + s)2
M (s) = eВµs exp{Оґ(О±2 в€’ОІ 2 )1/2 в€’Оґ(О±2 в€’(s+ОІ)2 )1/2 } for |ОІ+s| < О±.
О±2 в€’ ОІ 2
(3.23)
162 CHAPTER 3. BASIC MONTE CARLO METHODS

These moments are:

E(X) = Вµ + ОґОІ(О±2 в€’ ОІ 2 )в€’1/2

var(X) = ОґО±2 (О±2 в€’ ОІ 2 )в€’3/2

and the skewness and kurtosis:

skew = 3ОІО±в€’1 Оґ в€’1/2 (О±2 в€’ ОІ 2 )в€’1/4

kurtosis = 3Оґ в€’1 О±в€’2 (О±2 + 4ОІ 2 )(О±2 в€’ ОІ 2 )в€’1/2 .

One of the particularly attractive features of this family of distributions,
shared by the normal and the stable family of distributions, is that it is closed
under convolutions. This is apparent from the moment generating function
(3.23) since
M N (s)

gives a moment generating function of the same form but with Вµ replaced by
ВµN and Оґ by ОґN. In Figure 3.19 we plot the probability density function of a
member of this family of distributions.
Note the similarity to the normal density but with a modest amount of
skewness and increased weight in the tails. We can generate random variables
from this distribution as follows:

Sample T from an inverse Gaussian distribution (3.21)

Return X = Вµ + ОІT + N (0, T )
where N (0, T ) is a normal random variable with mean 0 and variance T.

We sample from the inverse Gaussian by using a property of the distribution
that if T has density of the form (3.21) then

(T в€’ Оё)2
(3.24)
c2 T
GENERATING RANDOM NUMBERS FROM NON-UNIFORM CONTINUOUS DISTRIBUTIONS163

Figure 3.19: Normal Inverse Gaussian probability density function with О± =
Оґ = 1, ОІ = 1 , Вµ = 0
2
164 CHAPTER 3. BASIC MONTE CARLO METHODS

has a chi-squared distribution with one degree of freedom (easily generated
as the square of a standard normal random variable). The algorithm is (see
Michael, Shucany, Hass (1976));

1. For
1
c= p , and Оё = Оґc,
О±2 в€’ ОІ 2
generate G1 from the Gamma( 1 , Оґ ) distribution. Deп¬Ѓne
c
2
r
2
Y1 = 1 + G1 (1 в€’ 1 + )}.
G1

1
2. Generate U2 в€ј U [0, 1]. If U2 В· then output T = ОёY1
1+Y1

3. Otherwise output T = ОёY1в€’1 .

The two values ОёY1 , and ОёY1в€’1 are the two roots of the equation obtained by
setting (3.24) equal to a chi-squared variate with one degree of freedom and the
1
relative values of the probability density function at these two roots are 1+Y1
1
and 1 в€’ 1+Y1 .

Finally to generate from the normal inverse Gaussian distribution (3.22) we
generate an inverse gamma random variable above and then set X = Вµ + ОІT +
N (0, T ). Prause (1999) provides a statistical evidence that the Normal Inverse
Gaussian provides a better п¬Ѓt than does the normal itself. For example we п¬Ѓt the
normal inverse gamma distribution to the S&P500 index returns over the period
Jan 1, 1997-Sept 27, 2002. There were a total of 1442 values over this period.
Figure 3.20 shows a histogram of the daily returns together with the normal
and the NIG п¬Ѓt to the data. The mean return over this period is 8 Г— 10в€’5 and
the standard deviation of returns 0.013. If we п¬Ѓt the normal inverse Gaussian
distribution to these returns we obtain parameter estimates

О± = 95.23, ОІ = в€’4.72, Оґ = 0.016, Вµ = 0.0009

and the Q-Q plots in Figure 3.21 . Both
GENERATING RANDOM NUMBERS FROM NON-UNIFORM CONTINUOUS DISTRIBUTIONS165

Figure 3.20: The Normal and the Normal inverse Gaussian п¬Ѓt to the S&P500
Returns

Figure 3.21: QQ plots showing the Normal Inverse Gaussian and the Normal п¬Ѓt
to S&P 500 data, 1997-2002
166 CHAPTER 3. BASIC MONTE CARLO METHODS

indicate that the normal approximation fails to properly п¬Ѓt the tails of the
distribution but that the NIG distribution is a much better п¬Ѓt. This is similar to
the conclusion in Prause using observations on the Dow Jones Industrial Index.

Generating Random Numbers from Discrete Dis-
tributions

Many of the methods described above such as inversion and acceptance-rejection
for generating continuous distributions work as well for discrete random vari-
ables. Suppose for example X is a discrete distribution taking values on the
integers with probability function P [X = x] = f (x), for x = 0, 1, 2, ... Suppose
we can п¬Ѓnd a continuous random variable Y which has exactly the same value
of its cumulative distribution function at these integers so that FY (j) = FX (j)
for all j = 1, 2, .... Then we may generate the continuous random variable Y,
say by inversion or acceptance-rejection and then set X = bY c the integer part
of Y . Clearly X takes integer values and since P [X В· j] = P [Y В· j] = FX (j)
for all j = 0, 1, ..., then X has the desired distribution. The continuous ex-
ponential distribution and the geometric distribution are linked in this way. If
X has a geometric(p) distribution and Y has the exponential distribution with
parameter О» = в€’ ln(1 в€’ p), then X has the same distribution as dY e or bY c + 1.

Using the inverse transform method for generating discrete random vari-
ables is usually feasible but for random variables with a wide range of values
of reasonably high probability, it often requires some setup costs to achieve
reasonable eп¬ѓciency. For example if X has cumulative distribution function
F (x), x = 0, 1, ...inversion requires that we output an integer X = F в€’1 (U ),an
integer X satisfying F (X в€’ 1) < U В· F (X). The most obvious technique for
п¬Ѓnding such a value of X is to search sequentially through the potential values
x = 0, 1, 2, .... Figure 3.22 is the search tree for inversion for the distribution on
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS167

.11

0
.41

1
.66

2
.87

3 4

distribution: .11 .30 .25 .21 .13

Figure 3.22: Sequential Search tree for Inverse Transform with root at x = 0

the integers 0, . . . 4 given by
0 1 2 3 4
x
0.11 0.30 0.25 0.21 0.13
f (x)
We generate an integer by repeatedly comparing a uniform [0,1] variate
U with the value at each node, taking the right branch if it is greater than
this threshold value, the left if it is smaller. If X takes positive integer values
{1, 2, ..., N },the number of values searched will average to E(X) which for many
discrete distributions can be unacceptably large.

An easy alternative is to begin the search at a value m which is near the
median (or mode or mean) of the distribution. For example we choose m = 2
and search to the left or right depending on the value of U in Figure 3.23.

If we assume for example that we root the tree at m then this results in
searching roughly an average of E[|X в€’ m + 1|] before obtaining the generated
variable. This is often substantially smaller than E(X) especially when E(X)
168 CHAPTER 3. BASIC MONTE CARLO METHODS

.66

.87
.41

4
2 3
.11

0 1

Figure 3.23: Search tree rooted near the median

is large but still unacceptably large when the distribution has large variance.
An optimal binary search tree for this distribution is graphed in Figure
3.24. This tree has been constructed from the bottom up as follows. We begin
by joining the two smallest probabilities f (4) and f (0) to form a new node
with weight f (0) + f (4) = 0.24. Since we take the left path (towards X =
0 rather than towards X = 4) if U is smaller than the value .11 labelling
the node at the intersection of these two branches. We now regard this pair
of values as a unit and continue to work up the tree from the leaves to the
root. The next smallest pair of probabilities are {0, 1} and {3} which have
probabilities 0.24 and 0.21 respectively so these are the next to be joined hence
working from the leaves to the root of the tree. This optimal binary search
tree provides the minimum expected number of comparisons and is equivalent
to sorting the values in order of largest to smallest probability, in this case
1, 2, 3, 4, 0 , relabelling them or coding them {0, 1, 2, 3, 4} and then applying the
inverse transform method starting at 0.
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS169

distribution .11 .30 .25 .21 .13

.45

.55
.24

2
.11 3
1

4
0

Figure 3.24: Optimal Binary Search Tree

The leaves of the tree are the individual probabilities f (j) and the internal
nodes are sums of the weights or probabilities of the вЂњchildrenвЂќ, the values f (j)
for j on paths below this node. Let Di represents the depth of the i0 th leaf so for
example the depth of leaf 0 in Figure 3.24 is D0 = 3. Then the average number
of comparisons to generate a single random variable Xstarting at the root is
P
i f (i)Di . The procedure for constructing the last tree provides an optimal

algorithm in the sense that this quantity is minimized. It is possible to show that
an optimal binary search tree will reduce the average number of comparisons
from E(X) for ordinary inversion to less than 1 + 4 [log2 (1 + E(X))].
Another general method for producing variates from a discrete distribution
was suggested by Walker (1974, 1977) and is called the alias method. This is
based on the fact that every discrete distribution is a uniform mixture of two-
point distributions. Apart from the time required to set up an initial table of
aliases and aliasing probabilities, the time required to generate values from a dis-
crete distribution with K supporting points is bounded in K, whereas methods
170 CHAPTER 3. BASIC MONTE CARLO METHODS

such as inverse transform have computational time which increase proportion-
ally with E(X).

Consider a discrete distribution of the form with probability function f (j)
on K integers j = 1, 2, ...K. We seek a table of values of A(i) and associated
вЂњaliasвЂќ probabilities q(i) so that the desired discrete random variable can be
generated in two steps, п¬Ѓrst generate one of the integers {1, 2, ..., K} at random
and uniformly, then if we generated the value I, say, replace it by an вЂњaliasвЂќ value
A(I) with alias probability q(I). These values A(I) and q(I) are determined
below. The algorithm is:

GENERATE I UNIFORM ON {1, ...K}.

WITH PROBABILITY q(I), OUTPUT X = I, OTHERWISE, X = A(I).

An algorithm for producing these values of (A(i), q(i)), i = 1, ..., K} is sug-
gested by Walker(1977) and proceeds by reducing the number of non-zero prob-
abilities one at a time.

1. Put q(i) = Kf (i) for all i = 1, ..., K.

2. LET m be the index so that q(m) = min{q(i); q(i) > 0} and let q(M ) =
max{q(i); q(i) > 0}.

3. SET A(m) = M and п¬Ѓx q(m) ( it is no longer is subject to change).

4. Replace q(M ) by q(M ) в€’ (1 в€’ q(m))

5. Replace (q(1), ...q(K))by (q(1), ..., q(m в€’ 1), q(m + 1), .q(M ) (so the com-
ponent with index m is removed).

6. Return to 2 unless all remaining qi = 1 or the vector of qi вЂ™s is empty.

Note that on each iteration of the steps above, we п¬Ѓx one of components
q(m) and remove it from the vector and adjust one other, namely q(M ). Since
we always п¬Ѓx the smallest q(m) and since the average q(i) is one, we always
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS171

obtain a probability, i.e. п¬Ѓx a value 0 < q(m) В· 1. Figure 3.25 shows the way
in which this algorithm proceeds for the distribution
1 2 3 4
x=
.1 .2 .3 .4
f (x) =
We begin with q(i) = 4 Г— f (i) = .4, .8, 1.2, 1.6 for i = 1, 2, 3, 4. Then since
m = 1 and M = 4 these are the п¬Ѓrst to be adjusted. We assign A(1) = 4 and
q(1) = 0.4. Now since we have reassigned mass 1 в€’ q(1) to M = 4 we replace
q(4) by 1.6 в€’ (1 в€’ 0.4) = 1. We now п¬Ѓx and remove q(1) and continue with
q(i) = .8, 1.2, 1.0 for i = 2, 3, 4. The next step results in п¬Ѓxing q(2) = 0.8,
A(2) = 3 and changing q(3) to q(3) в€’ (1 в€’ q(2)) = 1. After this iteration,
the remaining q(3), q(4) are both equal to 1, so according to step 6 we may
terminate the algorithm. Notice that we terminated without assigning a value
to A(3) and A(4). This assignment is unnecessary since the probability the alias
A(i) is used is (1 в€’ q(i)) which is zero in these two cases. The algorithm
therefore results in aliases A(i) = 4, 3, i = 1, 2 and q(i) = .4, .8, 1, 1, respectively
for i = 1, 2, 3, 4. Geometrically, this method iteratively adjusts a probability
histogram to form a rectangle with base K as in Figure 3.25.
Suppose I now wish to generate random variables from this discrete distrib-
ution. We simply generate a random variable uniform on the set {1, 2, 3, 4} and
if 1 is selected, we replace it by A(1) = 4 with probability 1 в€’ q(1) = 0.6. If 2
is selected it is replaced by A(2) = 3 with probability 1 в€’ q(2) = 0.2.

Acceptance-Rejection for Discrete Random Variables

The acceptance-rejection algorithm can be used both for generating discrete and
continuous random variables and the geometric interpretation in both cases is
essentially the same. Suppose for example we wish to generate a discrete random
variable X having probability function f (x) using as a dominating function
a multiple of g(x) the probability density function of a continuous random
variable. Take for example the probability function
172 CHAPTER 3. BASIC MONTE CARLO METHODS

distribution aliased: .1 .2 .3 .4

1

A(2)
=3

A(1)=
4

1.6

1.6
1.2
.8
.4

4 3
1 2 3 1 2 4

Figure 3.25: The alias method for generating from the distribution 0.1 0.2 0.3
0.4

1 2 3 4
x=
f (x) = .1 .3 .4 .2
using the dominating function 2g(x) = 0.1 + 0.2(x в€’ 0.5) for 0.5 < x < 4.5. It
is easy to generate a continuous random variable from the probability density
function g(x) by inverse transform. Suppose we generate the value X. Then if
this value is under the probability histogram graphed in Figure 3.26 we accept
the value (after rounding it to the nearest integer to conform the discreteness
of the output distribution) and otherwise we reject and repeat.

We may also dominate a discrete distribution with another discrete distrib-
ution in which case the algorithm proceeds as in the continuous case but with
the probability density functions replaced by probability functions.
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS173

0.9

0.8

0.7

0.6

Reject
0.5

0.4

0.3

0.2
Accept
0.1

0
1 2 3 4
X

Figure 3.26: Acceptance-Rejection with for Discrete Distribution with continu-
ous dominating function.

The Poisson Distribution.

Consider the probability function for a Poisson distribution with parameter О»

О»x eв€’О»
(3.25)
f (x) = , x = 0, 1, ...
x!

The simplest generator is to use the Poisson process. Recall that a Poisson
process with rate 1 on the real line can be described in two equivalent ways:

1. Points are distributed on the line in such a way that the spacings be-
tween consecutive points are independent exponential(О») random vari-
ables. Then the resulting process is a Poisson process with rate О».

2. The number of points in an interval of length h has a Poisson (О»h) distri-
bution. Moreover the numbers of points in non-overlapping intervals are
independent random variables.

The simplest generator stems from this equivalence. Suppose we use the
п¬Ѓrst speciп¬Ѓcation to construct a Poisson process with rate parameter 1 and
then examine X = the number of points occurring in the interval [0, О»]. This
is the number of partial sums of exponential(1) random variables that are less
174 CHAPTER 3. BASIC MONTE CARLO METHODS

than or equal to О»
n+1
X
X = inf{n; (в€’lnUi ) > О»}
i=1

or equivalently
n+1
Y
Ui < eв€’О» } (3.26)
X = inf{n;
i=1

This generator requires CPU time which grows linearly with О» since the
number of exponential random variables generated and summed grows linearly
with О» and so an alternative for large О» is required. Various possibilities of
acceptance-rejection algorithms have been suggested including dominating the
Poisson probability function with multiples of the logistic probability density
function (Atkinson (1979)), the normal density with exponential right tail (cf.
Devroye, lemma 3.8, page 509). A simple all-purpose dominating function is the
so-called table-mountain function (cf. Stadlober (1989)), essentially a function
with a п¬‚at top and tails that decrease as 1/x2 . Another simple alternative for
generating Poisson variates that is less eп¬ѓcient but simpler to implement is to
use the Lorentzian, or truncated Cauchy distribution with probability density
function
c0
(3.27)
g(x|a, b) = ,x > 0
b2 + (x в€’ a)2

where c0 is the normalizing constant. A random variable is generated from this
distribution using the inverse transform method; X = a + b tan(ПЂU ),, where
U в€ј U [0, 1]. Provided that we match the modes of the distribution a = О» and
в€љ
put b = 2О», this function may be used to dominate the Poisson distribution
and provide a simple rejection generator. The Matlab Poisson random number
generator is poissrnd (О», m, n) which generates an m Г— n matrix of Poisson(О»)
variables. This uses the simple generator (3.26) and is not computationally
eп¬ѓcient for large values of О».In R the command rpois(n,О») generates a vector
of n Poisson variates.
GENERATING RANDOM NUMBERS FROM DISCRETE DISTRIBUTIONS175

The Binomial Distribution

For the Binomial distribution, we may use any one of the following alternatives:

Pn
I(Ui < p), Ui в€ј independent uniform[0, 1]
(1) X = i=1

Px+1
Gi > n}, where Gi в€јindependent Geometric(p)
(2) X = inf{x; i=1

Px+1 Ei
> в€’log(1в€’p)}, where Ei в€јindependent Exponential(1).
(3) X = inf{x; i=1 nв€’i+1

Method (1) obtains from the deп¬Ѓnition of the sum of independent Bernoulli
random variables since each of the random variables I(Ui < p) are independent,
have values 0 and 1 with probabilities 1 в€’ p and p respectively. The event
(Ui < p) having probability p is typically referred to as a вЂњsuccessвЂќ. Obviously
this method will be slow if n is large. For method (2), recall that the number of
trials necessary to obtain the п¬Ѓrst success, G1 , say, has a geometric distribution.
Similarly, G2 represents the number of additional trials to obtain the second
success. So if X = j, the number of trials required to obtain j + 1 successes
was greater than n and to obtain j successes, less than or equal to n. In other
words there were exactly j successes in the п¬Ѓrst n trials. When n is large but np
fairly small, method (2) is more eп¬ѓcient since it is proportional to the number
os successes rather than the total number of trials. Of course for large n and
np suп¬ѓciently small (e.g. <1), we can also replace the Binomial distribution
by its Poisson (О» = np) approximation. Method (3) is clearly more eп¬ѓcient if
в€’log(1 в€’ p) is not too large so that p is not too close to 1, because in this case
we need to add fewer exponential random variables.
For large mean np and small n(1 в€’ p) we can simply reverse the role of
successes and failures and use method (2) or (3) above. But if both np and
n(1 в€’ p) are large, a rejection method is required. Again we may use rejection
p
beginning with a Lorentzian distribution, choosing a = np, and b = 2np(1 в€’ p)
in the case p < 1/2. When p > 1/2, we simply reverse the roles of вЂњfailuresвЂќ
and вЂњsuccessesвЂќ. Alternatively, a dominating table-mountain function may be
176 CHAPTER 3. BASIC MONTE CARLO METHODS

used (Stadlober (1989)). The binomial generator in Matlab is the function
binornd(n,p,j,k) which generates an n Г— k matrix of binomial(n, p) random
variables. This uses the simplest form (1) of the binomial generator and is not
computationally eп¬ѓcient for large n. In R, rbinom(m,n,p) will generate a vector
of length m of Binomial(n, p) variates.

Random Samples Associated with Markov Chains

Consider a п¬Ѓnite state Markov Chain, a sequence of (discrete) random variables
X1 , X2 , . . .each of which takes integer values 1, 2, . . . N (called states). The
number of states of a Markov chain may be large or even inп¬Ѓnite and it is not
always convenient to label them with the positive integers and so it is common
to deп¬Ѓne the state space as the set of all possible states of a Markov chain, but
we will give some examples of this later. For the present we restrict attention to
the case of a п¬Ѓnite state space. The transition probability matrix is a matrix P
describing the conditional probability of moving between possible states of the
chain, so that

P [Xn+1 = j|Xn = i] = Pij , i = 1, . . . N, j = 1, . . . N.

P
where Pij в‰Ґ 0 for all i, j and for all i. A limiting distribution
Pij = 1
j

of a Markov chain is a vector (ПЂ say) of long run probabilities of the individual
states with the property that

ПЂi = limtв†’в€ћ P [Xt = i].

A stationary distribution of a Markov chain is the column vector (ПЂ say) of
probabilities of the individual states such that

ПЂ0 P = ПЂ0 . (3.28)
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 177

ПЂ 0 P = ПЂ0 . For a Markov chain, every limiting distribution is in fact a station-
ary distribution. For the basic theory of Markov Chains, see the Appendix.
Roughly, a Markov chain which eventually вЂњforgetsвЂќ the states that were occu-
pied in the distant path, in other words for which the probability of the current
states does not vary much as we condition on diп¬Ђerent states in the distant
past, is called ergodic. A Markov chain which simply cycles through three
states 1 в†’ 2 в†’ 3 в†’ 1 в†’ ... is an example of a periodic chain, and is not
ergodic.

It is often the case that we wish to simulate from a п¬Ѓnite ergodic Markov
chain when it has reached equilibrium or stationarity, which is equivalent to
sampling from the distribution of Xn assuming that the distribution of X0
is given by the stationary distribution ПЂ. In a few cases, we can obtain this
stationary distribution directly from (3.28) but when N is large this system of
equations is usually not feasible to solve and we need to п¬Ѓnd another way to
sample from the probability vector ПЂ. Of course we can always begin the Markov
chain in some arbitrary initial state and run it waiting for Hele to freeze over (it
does happen since Helle is in Devon) until we are quite sure that the chain has
essentially reachedequilibrium, and then use a subsequent portion of this chain,
discarding this initial period, sometimes referred to as the вЂњinitial transientвЂќ.

Clearly this is often not a very eп¬ѓcient method, particularly in cases in
which the chain mixes or forgets its past very slowly for in this case the required
initial transient is long. On the other hand if we shortened it, we run the risk of
introducing bias into our simulations because the distribution generated is too
far from the equilibrium distribution ПЂ. There are a number of solutions to this
problem proposed in a burgeoning literature. Here we limit ourselves to a few
of the simpler methods.
178 CHAPTER 3. BASIC MONTE CARLO METHODS

Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm is a method for generating random variables
from a distribution ПЂ that applies even in the case of an inп¬Ѓnite number of
states or a continuous distribution ПЂ. It is assumed that ПЂ is known up to some
multiplicative constant. Roughly, the method consists of using a convenient
вЂњproposalвЂќ Markov chain with transition matrix Q to generate transitions, but
then only вЂњacceptвЂќ the move to these new states with probability that depends
on the distribution ПЂ. The idea resembles that behind importance sampling.
The basic result on which the Metropolis-Hastings algorithm is pinned is the
following theorem.

Theorem 31 Suppose Qij is the transition matrix of a Markov chain. Assume
P
that g is a vector of non-negative values such that N gi = G and
i=1

gj
| | В· K < в€ћ for all i, j
Qij

for some п¬Ѓnite value K. Deп¬Ѓne

gj Qji
ПЃij = min(1, )
gi Qij

Then the Markov Chain with transition probability matrix

Pij = Qij ПЃij , for i 6= j (3.29)

gi
has stationary distribution ПЂi = G.

Proof. The proof consists of showing that the so-called вЂњdetailed balance
gi
conditionвЂќ is satisп¬Ѓed, i.e. with ПЂi = that
G,

ПЂi Pij = ПЂj Pji , for all i, j. (3.30)

This condition implies that when the chain is operating in equilibrium,

P [Xn = i, Xn+1 = j] = P [Xn = j, Xn+1 = i]
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 179

reп¬‚ecting a cavalier attitude to the direction in which time п¬‚ows or reversibility
of the chain. Of course (3.30) is true automatically if i = j and for i 6= j,
gi gj Qji
ПЂi Pij = Qij min(1, )
G gi Qij
1
= min(gi Qij , gj Qji )
G
= ПЂj Pji

1
by the symmetry of the function min(gi Qij , gj Qji ). Now the detailed balance
G

condition (3.30) implies that ПЂ is a stationary distribution for this Markov chain
since
N N
X X
ПЂi Pij = ПЂj Pji
i=1 i=1
N
X
= ПЂj Pji
i=1

= ПЂj for each j = 1, ..., N.

Provided that we are able to generate transitions for the Markov Chain with
transition matrix Q, it is easy to generate a chain with transition matrix P in
(3.29). If we are currently in state i, generate the next state with probability
Qij . If j = i then we stay in state i. If j 6= i, then we вЂњacceptвЂќ the move to state
j with probability ПЃij , otherwise we stay in state i. Notice that the Markov
Chain with transition matrix P tends to favour moves which increase the value
of ПЂ. For example if the proposal chain is as likely to jump from i to j as it
is to jump back so that Qij = Qji , then if ПЂj > ПЂi the move to j is always
ПЂj
accepted whereas if ПЂj < ПЂi the move is only accepted with probability The
ПЂi .

assumption Qij = Qji is a common and natural one, since in applications of
the Metropolis-Hastings algorithm, it is common to choose j вЂњat randomвЂќ (i.e.
uniformly distributed) from a suitable neighborhood of i.
The above proof only provides that ПЂ is a stationary distribution of the
Markov Chain associated with P, not that it is necessarily the limiting distrib-
180 CHAPTER 3. BASIC MONTE CARLO METHODS

ution of this Markov chain. For this to follow we need to know that the chain
is ergodic. Various conditions for ergodicity are given in the literature. See for
example Robert and Casella (1999, Chapter 6) for more detail.

Gibbs Sampling

There is one simple special case of the Metropolis-Hastings algorithm that is
particularly simple, common and compelling. To keep the discussion simple,
suppose the possible states of our Markov Chain are points in two-dimensional
space (x, y). We may assume both components are discrete or continuous. Sup-
pose we wish to generate observations from a stationary distribution which is
proportional to g(x, y) so

g(x, y)
ПЂ(x, y) = P P (3.31)
y g(x, y)
x

deп¬Ѓned on this space but that the form of the distribution is such that directly
generating from this distribution is diп¬ѓcult, perhaps because it is diп¬ѓcult to
obtain the denominator of (3.31). However there are many circumstances where
it is much easier to obtain the value of the conditional distributions

ПЂ(x, y)
ПЂ(x|y) = P and
z ПЂ(z, y)
ПЂ(x, y)
ПЂ(y|x) = P
z ПЂ(x, z)

Now consider the following algorithm: begin with an arbitrary value of y0
and generated x1 from the distribution ПЂ(x|y0 ) followed by generating y1 from
the distribution ПЂ(y|x1 ). It is hard to imagine a universe in which iteratively
generating values xn+1 from the distribution ПЂ(x|yn ) and then yn+1 from the
distribution ПЂ(y|xn+1 ) does not, at least asymptotically as n в†’ в€ћ, eventually
lead to a draw from the joint distribution ПЂ(x, y). Indeed that is the case since
the transition probabilities for this chain are given by

P (xn+1 , yn+1 |xn , yn ) = ПЂ(xn+1 |yn )ПЂ(yn+1 |xn+1 )
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 181

and it is easy to show directly from these transition probabilities that
X
P (x1 , y1 |x, y)ПЂ(x, y)
(x,y)
X X
= ПЂ(y1 |x1 ) ПЂ(x1 |y) ПЂ(x, y)
y x
X
= ПЂ(y1 |x1 ) ПЂ(x1 , y)
y

= ПЂ(x1 , y1 ).

Of course the real power of Gibbs Sampling is achieved in problems that are
not two-dimensional such as the example above, but have dimension suп¬ѓciently
high that calculating the sums or integrals in the denominator of expressions
like (3.31) is not computationally feasible.

Coupling From the Past: Sampling from the stationary dis-
tribution of a Markov Chain

All of the above methods assume that we generate from the stationary distri-
bution of a Markov chain by the вЂњuntil Hele freezes overвЂќ method, i.e. wait
until run the chain from an arbitrary starting value and then delete the initial
transient. An alternative elegant method that is feasible at least for some п¬Ѓnite
state Markov chains is the method of вЂњcoupling from the pastвЂќ due to Propp
and Wilson (1996).
We assume that we are able to generate transitions in the Markov Chain. In
other words if the chain is presently in state i at time n we are able to generate
a random variable Xn+1 from the distribution proportional to Pij , j = 1, ...K.
Suppose F (x|i) is the cumulative distribution function P (Xn+1 В· x|Xn = i)
and let us denote its inverse by F в€’1 (y|i). So if we wish to generate a random
variable Xn+1 conditional on Xn , we can use the inverse transform Xn+1 =
F в€’1 (Un+1 |Xn ) applied to the Uniform[0,1] random variable Un+1 . Notice that
a starting value say Xв€’100 together with the sequence of uniform[0,1] variables
182 CHAPTER 3. BASIC MONTE CARLO METHODS

(Uв€’99 , ..., U0 ) determines the chain completely over the period в€’100 В· t В· 0.
If we wish to generated the value of Xt given Xs , s < t, then we can work this
expression backwards

Xt = F в€’1 (Utв€’1 |Xtв€’1 )

= F в€’1 (Utв€’1 |F в€’1 (Utв€’2 |Xtв€’2 ))

= F в€’1 (Utв€’1 |F в€’1 (Utв€’2 |...F в€’1 (Utв€’1 |F в€’1 (Us |i))))
t
= Fs (Xs ), say.

Now imagine an inп¬Ѓnite sequence {Ut , t = ..., в€’3, в€’2, в€’1} of independent uni-
form[0,1] random variables that was used to generate the state X0 of a chain
at time 0. Let us imagine for the moment that there is a value of M such that
0
Fв€’M (i) is a constant function of i. This means that for this particular draw
of uniform random numbers, whatever the state i of the system at time в€’M,
0
the same state X0 = Fв€’M (i) is generated to time 0. All chains, possibly with
diп¬Ђerent behaviour prior to time в€’M are вЂќcoupledвЂќ at time в€’M and identical
from then on. In this case we say that coalescence has occurred in the interval
[в€’M, 0]. No matter where we start the chain at time в€’M it ends up in the
same state at time 0, so it is quite unnecessary to simulate the chain over the
whole inп¬Ѓnite time interval в€’в€ћ < t В· 0. No matter what state is occupied
at time t = в€’M, the chain ends up in the same state at time t = 0. When
coalescence has occurred, we can safely consider the common value of the chain
at time 0 to be generated from the stationary distribution since it is exactly
the same value as if we had run the chain from t = в€’в€ћ.
There is sometimes an easy way to check whether coalescence has occurred
in an interval, if the state space of the Markov chain is suitably ordered. For
example suppose the states are numbered 1, 2, ..., N. Then it is sometimes
possible to relabel the states so that the conditional distribution functions F (x|i)
are stochastically ordered, or equivalently that F в€’1 (U |i) is monotonic (say
monotonically increasing) in i for each value of U. This is the case for example
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 183

Pj
provided that the partial sums Pil are increasing functions of i for each
l=1
0
j = 1, 2, ..., N. If follows that the functions Fв€’M (i) are all monotonic functions
of i and so
0 0 0
Fв€’M (1) В· Fв€’M (2) В· ...Fв€’M (N ).

0 0 0
Therefore, if Fв€’M (1) = Fв€’M (N ), then Fв€’M (i) must be a constant function.
Notice also that if there is any time in an interval [s, t] at which coalescence
t
occurs so that Fs (i) is a constant function of i, then for any interval [S, T ]
T
containing it [S, T ] вЉѓ [s, t], FS (i) is also a constant function of i.
It is easy to prove that coalescence occurs in the interval [в€’M, 0] for suf-
п¬Ѓciently large M. For an ergodic п¬Ѓnite Markov chain, there is some step size
П„ such that every transition has positive probability P [Xt+П„ = j|Xt = i] > ВІ
for all i, j. Consider two independent chains, one beginning in state i and the
other in state i0 at time t = 0. Then the probability that they occupy the same
state j at time t = П„ is at least ВІ2 . It is easy to see that if we use inverse
transform to generate the transitions and if they are driven by common random
numbers then this can only increase the probability of being in the same state,
so the probability these two chains are coupled at time П„ is at least ВІ2 . Similarly
for N possible states, the probability of coalescence in an interval of length П„
is at least ОµN > 0. Since there are inп¬Ѓnitely many intervals disjoint of length
П„ in [в€’в€ћ, 0] and the events that there is a coalescence in each interval are
independent, the probability that coalescence occurs somewhere in [в€’в€ћ, 0] is
1.
We now detail the Propp Wilson algorithm

1. Set M = 1, XU = N, XL = 1

2. Generate Uв€’M ....Uв€’M/2+1 all independent U nif orm[0, 1].

3. For t = в€’M to в€’1 repeat

(a) obtain XL = F в€’1 (Utв€’1 |XL ) and XU = F в€’1 (Utв€’1 |XU ).
184 CHAPTER 3. BASIC MONTE CARLO METHODS

(b) If XL = XU stop and output X(0) = XL

4. Otherwise, set M = 2M and go to step 2.

This algorithm tests for coalescence repeatedly by starting on the intervals

[в€’1, 0], [в€’2, в€’1], [в€’4, в€’2], [в€’8, в€’4].

We are assured that with probability one, the process will terminate with co-
alescence after a п¬Ѓnite number of steps. Moreover, in this algorithm that the
random variable Ut once generated is NOT generated again on a subsequent
pass when M is doubled. The generated Ut is reused at each pass until coales-
cence occurs. If Ut were regenerated on subseuqent passes, this would lead to
bias in the algorithm.
It may well be that this algorithm needs to run for a very long time before
achieving coalescence and an impatient observer who interrupts the algorithm
prior to coalescence and starts over will bias the results. Varous modiп¬Ѓcations
have been made to speed up the algorithm (e.g. Fill, 1998).

Sampling from the Stationary Distribution of a Diп¬Ђusion
Process

A basic Ito process of the form

dXt = a(Xt )dt + Пѓ(Xt )dWt

is perhaps the simplest extension of a Markov chain to continuous time, contin-
uous state-space. It is well-known that under fairly simple conditions, there is
a unique (strong) solution to this equation and that the limiting distribution of
XT as T в†’ в€ћ has stationary distribution with probability density function
Zx
1 a(z)
f (x) = c 2 exp{2 dz}
2
Пѓ (x) 0 Пѓ (z)
RANDOM SAMPLES ASSOCIATED WITH MARKOV CHAINS 185

where the constant c is chosen so that the integral of the density is 1. To be
able to do this we need to assume that
Z Z
в€ћ x
1 a(z)
dz}dx < в€ћ. (3.32)
exp{2
Пѓ 2 (x) Пѓ 2 (z)
в€’в€ћ 0

In order to generate from this stationary distribution, we can now start the
process at some arbitrary value X0 and run it for a very long time T , hoping
that this is suп¬ѓciently long that the process is essentially in its stationary
state, or try to generate X0 more directly from (3.32) in which case the process
is beginning (and subsequently running) with its stationary distribution.

1/2
dXt = k(b в€’ Xt )dt + ПѓXt (3.33)
dWt .

In this case

a(x) = k(b в€’ x), for x > 0,

Пѓ 2 (x) = Пѓ2 x, for x > 0.

Notice that
Z x
k(b в€’ z)
1 1 2kb k
dz} = 2 xв€’1 exp{ 2 ln(x/Оµ) в€’ 2 (x в€’ Оµ)}
exp{2
Пѓ2 x 2z
Пѓ Пѓ Пѓ Пѓ
Оµ

is proportional to
2
x2kb/Пѓ в€’1
exp{в€’kx/Пѓ 2 }

and the integral of this function, a Gamma function, will fail to converge unless
2kb/Пѓ 2 в€’ 1 > в€’1 or 2kb > Пѓ 2 . Under this condition the stationary distribution
2
of the CIR process is Gamma(2kb/Пѓ 2 , Пѓ ). If this condition fails and 2kb < Пѓ 2 ,
k

then the process Xt is absorbed at 0. If we wished to simulate a CIR process in
equilibrium, we should generate starting values of X0 from the Gamma distrib-
ution. More generally for a CEV process satisfying

Оі/2
dXt = k(b в€’ Xt )dt + ПѓXt (3.34)
dWt
186 CHAPTER 3. BASIC MONTE CARLO METHODS

a similar calculation shows that the stationary density is proportional to

2kb 1 k
xв€’Оі exp{в€’ в€’ 2 xОі }, for Оі > 1.
Пѓ 2 xОів€’1 (Оі в€’ 1) Пѓ Оі

Simulating Stochastic Partial Diп¬Ђerential Equa-
tions.

Consider a derivative product whose underlying asset has price Xt which follows
some model. Suppose the derivative pays an amount V0 (XT ) on the maturity
date T. Suppose that the value of the derivative depends only on the current time
t and the current value of the asset S, then its current value is the discounted
future payoп¬Ђ, an expectation of the form
Z T
(3.35)
V (S, t) = E[V0 (XT )exp{в€’ r(Xv , v)dv}|Xt = S]
t

where r(Xt , t) is the current spot interest rate at time t. In most cases, this ex-
pectation is impossible to evaluate analytically and so we need to resort to
numerical methods. If the spot interest rate is function of both arguments
(Xv , v) and not just a function of time, then this integral is over the whole
joint distribution of the process Xv , 0 < v < T and simple one-dimensional
methods of numerical integration do not suп¬ѓce. In such cases, we will usu-
ally resort to a Monte-Carlo method. The simplest version requires simulating
a number of sample paths for the process Xv starting at Xt = S, evaluating
RT
V0 (XT )exp{в€’ t r(Xv , v)dv} and averaging the results over all simulations. We
begin by discussing the simulation of the process Xv required for integrations
such as this.
Many of the stochastic models in п¬Ѓnance reduce to simple diп¬Ђusion equation
(which may have more than one factor or dimension). Most of the models
in п¬Ѓnance are Markovian in the sense that at any point t in time, the future
evolution of the process depends only on the current state Xt and not on the
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.187

past behaviour of the process Xs , s < t. Consequently we restrict to a вЂњMarkov
diп¬Ђusion modelвЂќ of the form

(3.36)
dXt = a(Xt , t)dt + Пѓ(Xt , t)dWt

with some initial value X0 for Xt at t = 0. Here Wt is a driving standard Brown-
ian motion process. Solving deterministic diп¬Ђerential equations can sometimes
provide a solution to a speciп¬Ѓc problem such as п¬Ѓnding the arbitrage-free price of
a derivative. In general, for more complex features of the derivative such as the
distribution of return, important for considerations such as the Value at Risk,
we need to obtain a solution {Xt , 0 < t < T }to an equation of the above form
which is a stochastic process. Typically this can only be done by simulation.
One of the simplest methods of simulating such a process is motivated through
a crude interpretation of the above equation in terms of discrete time steps, that
is that a small increment Xt+h в€’ Xt in the process is approximately normally
distributed with mean given by a(Xt , t)hand variance given by Пѓ 2 (Xt , t)h. We
generate these increments sequentially, beginning with an assumed value for
X0 , and then adding to obtain an approximation to the value of the process
at discrete times t = 0, h, 2h, 3h, . . .. Between these discrete points, we can
linearly interpolate the values. Approximating the process by assuming that
the conditional distribution of Xt+h в€’ Xt is N (a(Xt , t)h, Пѓ 2 (Xt , t)h) is called
EulerвЂ™s method by analogy to a simple method by the same name for solving or-
dinary diп¬Ђerential equations. Given simulations of the process satisfying (3.36)
together with some initial conditions, we might average the returns on a given
derivative for many such simulations, (provided the process is expressed with
respect to the risk-neutral distribution), to arrive at an arbitrage-free return for
the derivative.

In this section we will discuss the numerical solution, or simulation of the
solution to stochastic diп¬Ђerential equations.
188 CHAPTER 3. BASIC MONTE CARLO METHODS

Letting ti = iв€†x, Equation (3.36) in integral form implies
Z Z
ti+1 ti+1
(3.37)
Xti+1 = Xti + a(Xs , s)ds + Пѓ(Xs , s)dWs
ti ti

For the following lemma we need to introduce Op or вЂњorder in probabilityвЂќ,
notation common in mathematics and probability. A sequence indexed by в€†t,
say Yв€†t = Op (в€†t)k means that when we divide this term by (в€†t)k and then
let в€†t в†’ 0, the resulting sequence is bounded in probability or that for each Оµ
there exists K < в€ћ so that

Yв€†t
| > K] < Оµ
P [|
в€†tk

whenever |в€†t| < Оµ. As an example, if W is a Brownian motion, then в€†Wt =
W (t+в€†t)в€’W (t) has a Normal distribution with mean 0 and standard deviation
в€љ
в€†t and is therefore Op (в€†t)1/2 . Similarly Then we have two very common
and useful approximations to a diп¬Ђusion given by the following lemma.

Lemma 32 If Xt satisп¬Ѓes a diп¬Ђusion equation of the form (3.36) then

Xti+1 = Xti + a(Xti , ti )в€†t + Пѓ(Xti , ti )в€†Wt + Op (в€†t) (Euler approximation)
в€‚
Пѓ(Xti , ti ) в€‚x Пѓ(Xti , ti )
[(в€†Wt )2 в€’ в€†t] + Op (в€†t)3/2 (Milstei
Xti+1 = Xti + a(Xti , ti )в€†t + Пѓ(Xti , ti )в€†Wt +
2

Proof. ItoвЂ™s lemma can be written in terms of two operators on functions
f for which the derivatives below exist;

df (Xt , t) = Lo f dt + L1 f dWt where

1 2 в€‚2
в€‚ в€‚
0
+ , and
L =a +Пѓ 2
в€‚x 2 в€‚x в€‚t
в€‚
L1 = Пѓ .
в€‚x
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.189

Integrating, this and applying to twice diп¬Ђerentiable functions a and Пѓ and
s > ti ,
Z Z
s s
0
L1 a(Xu , u)dWu
a(Xs , s) = a(Xti , ti ) + L a(Xu , u)du +
t t
Z is Z is
L0 Пѓ(Xu , u)du + L1 Пѓ(Xu , u)dWu .
Пѓ(Xs , s) = Пѓ(Xti , ti ) +
ti ti

By substituting in each of the integrands in 3.37 using the above identity and
iterating this process we arrive at the Ito-Taylor expansions (e.g. Kloeden and
Platen, 1992). For example,
Z ti+1 Z ti+1 Z Z
s s
0
L1 a(Xu , u)dWu }ds
{a(Xti , ti ) +
a(Xs , s)ds = L a(Xu , u)du +
ti ti ti ti
Z ti+1 Z Z Z
s ti+1 s
0 1
в‰€ a(Xti , ti )в€†t + L a(Xti , ti ) duds + L a(Xti , ti ) dWu ds
ti ti ti ti

The п¬Ѓrst term a(Xti , ti )в€†t, is an initial approximation to the desired inte-
gral and the rest is a lower order correction that we may regard as an er-
ror term for the moment. For example it is easy to see that the second term
Rt Rs Rt Rs
L0 a(Xti , ti ) tii+1 ti duds is Op (в€†t)2 because the integral tii+1 ti duds = (в€†t)2 /2
Rt Rs
and L0 a(Xti , ti ) is bounded in probability. The third term L1 a(Xti , ti ) tii+1 ti dWu ds
Rt Rs Rt
is Op (в€†t)3/2 since tii+1 ti dWu ds = tii+1 (ti+1 в€’ u)dWu and this is a normal
Rt
random variable with mean 0 and variance tii+1 (ti+1 в€’ u)2 du = (в€†t)3 /3. We
can write such a normal random variable as 3в€’1/2 (в€†t)3/2 Z for Z a standard
normal random variable and so this is obviously Op (в€†t)3/2 . Thus the simplest
Euler approximation to the distribution of the increment assumes that в€†X has
conditional mean a(Xti , ti )в€†t. Similarly
Z ti+1 Z ti+1 Z Z
s s
0
L1 Пѓ(Xu , u)dWu }dWs
{Пѓ(Xti , ti ) +
Пѓ(Xs , s)dWs = L Пѓ(Xu , u)du +
ti ti ti ti
Z Z Z Z
ti+1 s ti+1 s
0 1
в‰€ Пѓ(Xti , ti )в€†Wt + L Пѓ(Xti , ti ) dudWs + L Пѓ(Xti , ti ) dWu dWs
ti ti ti ti
в€‚
Пѓ(Xti , ti ) в€‚x Пѓ(Xti , ti )
[(в€†Wt )2 в€’ в€†t] + Op (в€†t)3/2
= Пѓ(Xti , ti )в€†Wt +
2
R ti+1 R s
dWu dWs = 1 [(в€†Wt )2 в€’ в€†t], L0 Пѓ(Xu , u) = Пѓ(Xti , ti ) +Op (в€†t)1/2 ,
since 2
ti ti
Rt Rs
в€‚
L1 Пѓ(Xu , u) = Пѓ(Xu , u) в€‚x Пѓ(Xu , u) and tii+1 ti dudWs = Op (в€†t)3/2 . Putting
190 CHAPTER 3. BASIC MONTE CARLO METHODS

these terms together, we arrive at an approximation to the increment of the
form

в€‚
Пѓ(Xti , ti ) в€‚x Пѓ(Xti , ti )
[(в€†Wt )2 в€’в€†t]+Op (в€†t)3/2
в€†Xt = a(Xti , ti )в€†t+Пѓ(Xti , ti )в€†Wt +
2
(3.38)
which allow an explicit representation of the increment in the process X in terms
of the increment of a Brownian motion process в€†Wt в€ј N (0, в€†t).

The approximation (3.38) is called the Milstein approximation, a reп¬Ѓnement
of the п¬Ѓrst, the Euler approximation. It is the second Ito-Taylor approximation
to a diп¬Ђusion process. Obviously, the increments of the process are quadratic
functions of a normal random variable and are no longer normal. The error
approaches 0 at the rate Op (в€†t)3/2 in probability only. This does not mean
that the trajectory is approximated to this order but that the diп¬Ђerence between
the Milstein approximation to a diп¬Ђusion and the diп¬Ђusion itself is bounded in
probability when divided by (в€†t)3/2 and as we let в€†t в†’ 0. Higher order Taylor
approximations are also possible, although they grow excessively complicated
very quickly. See the book by Kloeden and Platten(1992) for details.

There remains the question of how much diп¬Ђerence it makes which of these
approximations we employ for a particular diп¬Ђusion. Certainly there is no dif-
ference at all between the two approximations in the case that the diп¬Ђusion
coeп¬ѓcient Пѓ(Xt , t) does not depend at all on Xt . In general, the diп¬Ђerence is
hard to assess but in particular cases we can at least compare the performance of
the two methods. The approximations turn out to be very close in most simple
cases. For example consider the stock price path in Figure 3.27. The dashed
line corresponds to a Milstein approximation whereas the piecewise continuous
line corresponds to the Euler approximation. In this case the Milstein appears
to be a little better, but if I run a number of simulations and compare the sum
of the squared errors (i.e. squared diп¬Ђerences between the approximate value
of Xt and the true value of Xt ) we п¬Ѓnd that the improvement is only about
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.191

Figure 3.27: Comparison of Milstein and Euler approximation to stock with
в€†t = 1/12 year.

two percent of the diп¬Ђerence. The same is true even if I change the value of
в€†t from 1/12 (i.e. one month) to 1/52 (i.e. one week). Unlike the behaviour
of higher order approximations to deterministic functions, there appears to be
little advantage in using a higher order approximation, at least in the case of
diп¬Ђusions with smooth drift and diп¬Ђusion coeп¬ѓcients.

We can compare using Milstein approximation on the original process and
using EulerвЂ™s approximation on a transformation of the process in the case that
the diп¬Ђusion term depends only on the state of the process (not time). In other
192 CHAPTER 3. BASIC MONTE CARLO METHODS

words, suppose we have an Ito process of the form

(3.39)
dXt = a(Xt , t)dt + Пѓ(Xt )dWt

where Wt is an ordinary Wiener measure. A simple transformation reduces this
to a problem with constant diп¬Ђusion term. Suppose Пѓ(x) > 0 for all x and let
Z x
1
dz, for x в‰Ґ 0
s(x) =
Пѓ(z)
0
Z 0
1
s(x) = в€’ dz for x < 0
Пѓ(z)
x

where we assume these integrals are well deп¬Ѓned. Let g be the inverse function of
s. This inverse exists since the function is continuous monotonically increasing.
Suppose we apply ItoвЂ™s lemma to the transformed process Yt = s(Xt ). We obtain

1
dYt = {a(Xt , t)s0 (Xt ) + Пѓ2 (Xt )s00 (Xt )}dt + Пѓ(Xt )s0 (Xt )dWt
2
Пѓ 0 (Xt )
a(Xt , t) 1 2
={ в€’ Пѓ (Xt ) 2 }dt + dWt
Пѓ(Xt ) 2 Пѓ (Xt )
= Вµ(Yt , t)dt + dWt

where
a(g(Yt ), t) 1 0
в€’ Пѓ (g(Yt )).
Вµ(Yt , t) =
Пѓ(g(Yt )) 2

In other words, Y t satisп¬Ѓes an Ito equation with constant diп¬Ђusion term.
Suppose we generate an increment in Yt using EulerвЂ™s method and then solve
for the corresponding increment in Xt .Then using the п¬Ѓrst two terms in the
Taylor series expansion of g,

1
в€†Xt = g 0 (Yt )в€†Yt + g 00 (Yt )(в€†Yt )2
2
1
= g 0 (Yt )(Вµ(Yti , ti )в€†t + в€†Wt ) + Пѓ 0 (g(Yt ))Пѓ(g(Yt ))(в€†Yt )2
2
1 1
= {a(g(Yt ), t) в€’ Пѓ(g(Yt ))Пѓ 0 (g(Yt ))}в€†t + Пѓ(g(Yt ))в€†Wt + Пѓ 0 (g(Yt ))Пѓ(g(Yt ))(в€†Yt )2
2 2
SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.193

since

1
g 0 (Yt ) = = Пѓ(g(Yt )) and
s0 (g(Yt ))
g 00 (Yt ) = Пѓ0 (g(Yt ))Пѓ(g(Yt )).

But since (в€†Yt )2 = (в€†Wt )2 + o(в€†t) it follows that

1 1
в€†Xt = {a(Xt , t)в€’ Пѓ(Xt )Пѓ 0 (Xt )}в€†t+Пѓ(Xt )в€†Wt + Пѓ 0 (Xt )Пѓ(Xt )(в€†Wt )2 +o(в€†t)
2 2

and so the approximation to this increment is identical, up to the order con-
sidered, to the Milstein approximation. For most processes, it is preferable
to apply a diп¬Ђusion stabilizing transformation as we have here, prior to dis-
cretizing the process. For the geometric Brownian motion process, for example,
the diп¬Ђusion-stabilizing transformation is a multiple of the logarithm, and this
transforms to a Brownian motion, for which the Euler approximation gives the
exact distribution.

Example: Down-and-out-Call.

Consider an asset whose price under the risk-neutral measure Q follows a con-
stant elasticity of variance (CEV) process

Оі
(3.40)
dSt = rSt dt + ПѓSt dWt

for a standard Brownian motion process Wt . A down-and-out call option with
exercise price K provides the usual payment (ST в€’ K)+ of a European call
option on maturity T if the asset never falls below a given out barrier b. The
parameter Оі > 0 governs the change in the diп¬Ђusion term as the asset price
changes. We wish to use simulation to price such an option with current asset
price S0 , time to maturity T , out barrier b < S0 and constant interest rate r
and compare with the Black-Scholes formula as b в†’ 0.
A geometric Brownian motion is most easily simulated by taking logarithms.
194 CHAPTER 3. BASIC MONTE CARLO METHODS

For example if St satisп¬Ѓes the risk-neutral speciп¬Ѓcation

(3.41)
dSt = rSt dt + ПѓSt dWt

then Yt = log(St ) satisп¬Ѓes

dYt = (r в€’ Пѓ2 /2)dt + ПѓdWt . (3.42)

This is a Brownian motion and is simulated with a normal random walk. In-
dependent normal increments are generated в€†Yt в€ј N ((r в€’ Пѓ2 /2)в€†t, Пѓ2 в€†t) and
their partial sums used to simulate the process Yt . The return for those options
that are in the money is the average of the values of (eYT в€’ E)+ over those
paths for which min{Ys ; t < s < T } в‰Ґ ln(b). Similarly the transformation of
the CEV process which provides a constant diп¬Ђusion term is determined by
Zx
1
s(x) = dz
0 Пѓ(z)
вЋ§
Zx вЋЁ x1в€’Оі if Оі 6= 1
1в€’Оі
z в€’Оі dz =
= .
вЋ© ln(x) if Оі = 1
0

 << стр. 6(всего 11)СОДЕРЖАНИЕ >>