. 3
( 11)



where both processes ·t and gt are “predictable” which loosely means that
they are determined in advance of observing the increment St , St+∆t . Then the
process Zs is the analogue of the Radon-Nikodym derivative of the processes

restricted to the time interval 0 · t · s. For a more formal de¬nition, as well as
an explanation of how we should interpret the integral, see the appendix. This
process Zs is, both in discrete and continuous time, a martingale.

Wiener Process








0 1 2 3 4 5 6 7 8 9 10

Figure 2.6: A sample path of the Wiener process

Models in Continuous Time

We begin with some oversimpli¬ed rules of stochastic calculus which can be
omitted by those with a background in Brownian motion and di¬usion. First,
we de¬ne a stochastic process Wt called the standard Brownian motion or
Wiener process having the following properties;

1. For each h > 0, the increment W (t+h)’W (t) has a N (0, h) distribution
and is independent of all preceding increments W (u) ’ W (v), t > u > v >

2. W (0 ) = 0 .


The fact that such a process exists is by no means easy to see. It has been an
important part of the literature in Physics, Probability and Finance at least since
the papers of Bachelier and Einstein, about 100 years ago. A Brownian motion
process also has some interesting and remarkable theoretical properties; it is
continuous with probability one but the probability that the process has ¬nite

Random Walk







0 2 4 6 8 10 12 14 16 18 20

Figure 2.7: A sample path of a Random Walk

variation in any interval is 0. With probability one it is nowhere di¬erentiable.
Of course one might ask how a process with such apparently bizarre properties
can be used to approximate real-world phenomena, where we expect functions
to be built either from continuous and di¬erentiable segments or jumps in the
process. The answer is that a very wide class of functions constructed from those
that are quite well-behaved (e.g. step functions) and that have independent
increments converge as the scale on which they move is re¬ned either to a
Brownian motion process or to a process de¬ned as an integral with respect to a
Brownian motion process and so this is a useful approximation to a broad range
of continuous time processes. For example, consider a random walk process
i=1 Xi where the random variables Xi are independent identically
Sn =
distributed with expected value E(Xi ) = 0 and var(Xi ) = 1. Suppose we plot
the graph of this random walk (n, Sn ) as below. Notice that we have linearly
interpolated the graph so that the function is de¬ned for all n, whether integer
or not.


Now if we increase the sample size and decrease the scale appropriately on
both axes, the result is, in the limit, a Brownian motion process. The vertical

scale is to be decreased by a factor 1/ n and the horizontal scale by a factor
n’1 . The theorem concludes that the sequence of processes

Yn (t) = √ Snt

converges weakly to a standard Brownian motion process as n ’ ∞. In practice
this means that a process with independent stationary increments tends to look
like a Brownian motion process. As we shall see, there is also a wide variety
of non-stationary processes that can be constructed from the Brownian motion
process by integration. Let us use the above limiting result to render some
of the properties of the Brownian motion more plausible, since a serious proof
is beyond our scope. Consider the question of continuity, for example. Since
|Yn (t + h) ’ Yn (t)| ≈ | √n i=nt Xi | and this is the absolute value of an
asymptotically normally(0, h) random variable by the central limit theorem, it
is plausible that the limit as h ’ 0 is zero so the function is continuous at t.
On the other hand note that
11 X
Yn (t + h) ’ Yn (t)
≈√ Xi
h h n i=nt

should by analogy behave like h’1 times a N (0, h) random variable which blows
up as h ’ 0 so it would appear that the derivative at t does not exist. To
obtain the total variation of the process in the interval [t, t + h] , consider the
lengths of the segments in this interval, i.e.
√ |Xi |
n i=nt
|Xi |
and notice that since the law of large numbers implies that i=nt

converges to a positive constant, namely E|Xi |, if we multiply by the
limit must be in¬nite, so the total variation of the Brownian motion process is

Continuous time process are usually built one small increment at a time
and de¬ned to be the limit as the size of the time increment is reduced to zero.
Let us consider for example how we might de¬ne a stochastic (Ito) integral of
the form 0 h(t)dWt . An approximating sum takes the form
Z n’1
h(t)dWt ≈ h(ti )(W (ti+1 ) ’ W (ti )), 0 = t0 < t1 < ... < tn = T.
0 i=0

Note that the function h(t) is evaluated at the left hand end-point of the in-
tervals [ti , ti+1 ], and this is characteristic of the Ito calculus, and an important
feature distinguishing it from the usual Riemann calculus studied in undergrad-
uate mathematics courses. There are some simple reasons why evaluating the
function at the left hand end-point is necessary for stochastic models in ¬nance.
For example let us suppose that the function h(t) measures how many shares
of a stock we possess and W (t) is the price of one share of stock at time t.
It is clear that we cannot predict precisely future stock prices and our decision
about investment over a possibly short time interval [ti , ti+1 ] must be made
at the beginning of this interval, not at the end or in the middle. Second, in
the case of a Brownian motion process W (t), it makes a di¬erence where in
the interval [ti , ti+1 ] we evaluate the function h to approximate the integral,
whereas it makes no di¬erence for Riemann integrals. As we re¬ne the parti-
tion of the interval, the approximating sums i=0 h(ti+1 )(W (ti+1 ) ’ W (ti )),
for example, approach a completely di¬erent limit. This di¬erence is essentially
due to the fact that W (t), unlike those functions studied before in calculus, is
of in¬nite variation. As a consequence, there are other important di¬erences in
the Ito calculus. Let us suppose that the increment dW is used to denote
small increments W (ti+1 ) ’ W (ti ) involved in the construction of the integral.
If we denote the interval of time ti+1 ’ ti by dt, we can loosely assert that dW
has the normal distribution with mean 0 and variance dt. If we add up a large
number of independent such increments, since the variances add, the sum has
variance the sum of the values dt and standard deviation the square root. Very

roughly, we can assess the size of dW since its standard deviation is (dt)1/2 .
Now consider de¬ning a process as a function both of the Brownian motion and
of time, say Vt = g(Wt , t). If Wt represented the price of a stock or a bond,
Vt might be the price of a derivative on this stock or bond. Expanding the
increment dV using a Taylor series expansion gives

‚2 dW 2
‚ ‚
dVt = g(Wt , t)dW + g(Wt , t) + g(Wt , t)dt
‚W 2
‚W 2 ‚t
+ (stu¬) — (dW )3 + (more stu¬) — (dt)(dW )2 + ....

is normal with mean 0 and standard deviation (dt)1/2 and
Loosely, dW
so dW is non-negligible compared with dt as dt ’ 0. We can de¬ne each of the
di¬erentials dW and dt essentially by reference to the result when we integrate
both sides of the equation. If I were to write an equation in di¬erential form

dXt = h(t)dWt

then this only has real meaning through its integrated version
Z t
Xt = X0 + h(t)dWt .

What about the terms involving (dW )2 ? What meaning should we assign to a
term like h(t)(dW )2 ? Consider the approximating function h(ti )(W (ti+1 )’
W (ti ))2 . Notice that, at least in the case that the function h is non-random we
are adding up independent random variables h(ti )(W (ti+1 ) ’ W (ti ))2 each with
expected value h(ti )(ti+1 ’ ti ) and when we add up these quantities the limit
is h(t)dt by the law of large numbers. Roughly speaking, as di¬erentials, we
should interpret (dW )2 as dt because that is the way it acts in an integral.
Subsequent terms such as (dW )3 or (dt)(dW )2 are all o(dt), i.e. they all
approach 0 faster than does dt as dt ’ 0. So ¬nally substituting for (dW )2 in
2.24 and ignoring all terms that are o(dt), we obtain a simple version of Ito™s

1 ‚2
‚ ‚
g(Wt , t)dW + {
dg(Wt , t) = g(Wt , t) + g(Wt , t)}dt.
‚W 2 ‚W ‚t

This rule results, for example, when we put g(Wt , t) = Wt2 in

d(Wt2 ) = 2Wt dWt + dt

or on integrating both sides and rearranging,
Zb Z
1 2 2
Wt dWt = (Wb ’ Wa ) ’ dt.
2 2a
The term a dt above is what distinguishes the Ito calculus from the Riemann
calculus, and is a consequence of the nature of the Brownian motion process, a
continuous function of in¬nite variation.
There is one more property of the stochastic integral that makes it a valuable
tool in the construction of models in ¬nance, and that is that a stochastic integral
with respect to a Brownian motion process is always a martingale. To see this,
note that in an approximating sum
Z n’1
h(t)dWt ≈ h(ti )(W (ti+1 ) ’ W (ti ))
0 i=0

each of the summands has conditional expectation 0 given the past, i.e.

E[h(ti )(W (ti+1 ) ’ W (ti ))|Hti ] = h(ti )E[(W (ti+1 ) ’ W (ti ))|Hti ] = 0

since the Brownian increments have mean 0 given the past and since h(t) is
measurable with respect to Ht .
We begin with an attempt to construct the model for an Ito process or dif-
fusion process in continuous time. We construct the price process one increment
at a time and it seems reasonable to expect that both the mean and the vari-
ance of the increment in price may depend on the current price but does not
depend on the process before it arrived at that price. This is a loose description
of a Markov property. The conditional distribution of the future of the process

depends only on the current time t and the current price of the process. Let us
suppose in addition that the increments in the process are, conditional on the
past, normally distributed. Thus we assume that for small values of h, con-
ditional on the current time t and the current value of the process Xt , the
increment Xt+h ’ Xt can be generated from a normal distribution with mean
a(Xt , t)h and with variance σ 2 (Xt , t)h for some functions a and σ2 called the
drift and di¬usion coe¬cients respectively. Such a normal random variable can
be formally written as a(Xt , t )dt+ σ 2 (Xt , t)dWt . Since we could express XT as
an initial price X0 plus the sum of such increments, XT = X0 + i (Xti+1 ’Xti ).
The single most important model of this type is called the Geometric Brown-
ian motion or Black-Scholes model. Since the actual value of stock, like the
value of a currency or virtually any other asset is largely arti¬cial, depending on
such things as the number of shares issued, it is reasonable to suppose that the
changes in a stock price should be modeled relative to the current price. For
example rather than model the increments, it is perhaps more reasonable to
model the relative change in the process. The simplest such model of this type
is one in which both the mean and the standard deviation of the increment in
the price are linear multiples of price itself; viz. dXt is approximately nor-
mally distributed with mean aXt dt and variance σ 2 Xt dt. In terms of stochastic

di¬erentials, we assume that

dXt = aXt dt + σXt dWt .

Now consider the relative return from such a process over the increment dYt =
dXt /Xt . Putting Yt = g(Xt ) = ln(Xt ) note that analogous to our derivation of
Ito™s lemma
dg(Xt ) = g 0 (Xt )dXt + g 00 (Xt )(dX)2 + ...
1 1 22
{aXt dt + σXt dWt .} ’
= 2 σ Xt dt
Xt 2Xt
= (a ’ )dt + σdWt .

which is a description of a general Brownian motion process, a process with
increments dYt that are normally distributed with mean (a ’ and with
2 )dt

variance σ 2 dt. This process satisfying dXt = aXt dt + σXt dWt is called the
Geometric Brownian motion process (because it can be written in the form
Xt = eYt for a Brownian motion process Yt ) or a Black-Scholes model.
Many of the continuous time models used in ¬nance are described as Markov
di¬usions or Ito processes which permits the mean and the variance of the
increments to depend more generally on the present value of the process and
the time. The integral version of this relation is of the form
XT = X0 + a(Xt , t)dt + σ(Xt , t)dWt .
0 0

We often write such an equation with di¬erential notation,

dXt = a(Xt , t)dt + σ(Xt , t)dWt .

but its meaning should always be sought in the above integral form. The co-
e¬cients a(Xt , t) and σ(Xt , t) vary with the choice of model. As usual, we
interpret 2.26 as meaning that a small increment in the process, say dXt =
Xt+h ’ Xt (h very small) is approximately distributed according to a normal
distribution with conditional mean a(Xt , t)dt and conditional variance given by
σ 2 (Xt , t)var(dWt ) = σ 2 (Xt , t)dt. Here the mean and variance are conditional
on Ht , the history of the process Xt up to time t.
Various choices for the functions a(Xt , t), σ(Xt , t) are possible. For the
Black-Scholes model or geometric Brownian motion, a(Xt , t) = aXt and σ(Xt , t) =
σXt for constant drift and volatility parameters a, σ. The Cox-Ingersoll-Ross
model, used to model spot interest rates, corresponds to a(Xt , t) = A(b ’ Xt )

and σ(Xt , t) = c Xt for constants A, b, c. The Vasicek model, also a model for
interest rates, has a(Xt , t) = A(b ’ Xt ) and σ(Xt , t) = c. There is a large num-
ber of models for most continuous time processes observed in ¬nance which can
be written in the form 2.26. So called multi-factor models are of similar form

where Xt is a vector of ¬nancial time series and the coe¬cient functions a(Xt , t)
is vector valued, σ(Xt , t) is replaced by a matrix-valued function and dWt is
interpreted as a vector of independent Brownian motion processes. For techni-
cal conditions on the coe¬cients under which a solution to 2.26 is guaranteed
to exist and be unique, see Karatzas and Shreve, sections 5.2, 5.3.
As with any di¬erential equation there may be initial or boundary condi-
tions applied to 2.26 that restrict the choice of possible solutions. Solutions
to the above equation are di¬cult to arrive at, and it is often even more di¬-
cult to obtain distributional properties of them. Among the key tools are the
Kolmogorov di¬erential equations (see Cox and Miller, p. 215). Consider the
transition probability kernel

p(s, z, t, x) = P [Xt = x|Xs = z]

in the case of a discrete Markov Chain. If the Markov chain is continuous (as it
is in the case of di¬usions), that is if the conditional distribution of Xt given Xs
is absolutely continuous with respect to Lebesgue measure, then we can de¬ne
p(s, z, t, x) to be the conditional probability density function of Xt given Xs = z.
The two equations, for a di¬usion of the above form, are:
Kolmogorov™s backward equation

‚ ‚ 12
p = ’a(z, s) p ’ σ (z, s) 2 p (2.27)
‚s ‚z 2 ‚z

and the forward equation

1 ‚2 2
‚ ‚
p = ’ (a(x, t)p) + (2.28)
(σ (x, t)p)
2 ‚x2
‚t ‚x

Note that if we were able to solve these equations, this would provide the
transition density function p, giving the conditional distribution of the process.
It does not immediately provide other characteristics of the di¬usion, such as
the distribution of the maximum or the minimum, important for valuing various
exotic options such as look-back and barrier options. However for a European

option de¬ned on this process, knowledge of the transition density would su¬ce
at least theoretically for valuing the option. Unfortunately these equations are
often very di¬cult to solve explicitly.
Besides the Kolmogorov equations, we can use simple ordinary di¬erential
equations to arrive at some of the basic properties of a di¬usion. To illustrate,
consider one of the simplest possible forms of a di¬usion, where a(Xt , t) =
±(t)+β(t)Xt where the coe¬cients ±(t), β(t) are deterministic (i.e. non-random)
functions of time. Note that the integral analogue of 2.26 is
t t
Xt = X0 + a(Xs , s)ds + σ(Xs , s)dWs
0 0
and by construction that last term σ(Xs , s)dWs is a zero-mean martingale.

For example its small increments σ(Xt , t)dWs are approximately N (0, σ(Xt , t)dt).
Therefore, taking expectations on both sides conditional on the value of X0 , and
letting m(t) = E(Xt ), we obtain:
Z t
m(t) = X0 + [±(s) + β(s)m(s)]ds

and therefore m(t)solves the ordinary di¬erential equation

m0 (t) = ±(t) + β(t)m(t). (2.31)

m(0) = X0

Thus, in the case that the drift term a is a linear function of Xt , the mean or
expected value of a di¬usion process can be found by solving a similar ordinary
di¬erential equation, similar except that the di¬usion term has been dropped.
These are only two of many reasons to wish to solve both ordinary and
partial di¬erential equations in ¬nance. The solution to the Kolmogorov partial
di¬erential equations provides the conditional distribution of the increments of
a process. And when the drift term a(Xt , t ) is linear in Xt , the solution of an
ordinary di¬erential equation will allow the calculation of the expected value of
the process and this is the ¬rst and most basic description of its behaviour. The

appendix provides an elementary review of techniques for solving partial and
ordinary di¬erential equations.
However, that the information about a stochastic process obtained from a
deterministic object such as a ordinary or partial di¬erential equation is nec-
essarily limited. For example, while we can sometimes obtain the marginal
distribution of the process at time t it is more di¬cult to obtain quantities
such as the joint distribution of variables which depending on the path of the
process, and these are important in valuing certain types of exotic options such
as lookback and barrier options. For such problems, we often use Monte Carlo

The Black-Scholes Formula

Before discussing methods of solution in general, we develop the Black-Scholes
equation in a general context. Suppose that a security price is an Ito process
satisfying the equation

dS t = a(St , t ) dt + σ(St , t) dW t

Assumed the market allows investment in the stock as well as a risk-free bond
whose price at time t is Bt . It is necessary to make various other assumptions
as well and strictly speaking all fail in the real world, but they are a reasonable
approximation to a real, highly liquid and nearly frictionless market:

1. partial shares may be purchased

2. there are no dividends paid on the stock

3. There are no commissions paid on purchase or sale of the stock or bond

4. There is no possibility of default for the bond

5. Investors can borrow at the risk free rate governing the bond.

6. All investments are liquid- they can be bought or sold instantaneously.

Since bonds are assumed risk-free, they satisfy an equation

dBt = rt Bt dt

where rt is the risk-free (spot) interest rate at time t.
We wish to determine V (St , t), the value of an option on this security when
the security price is St , at time t. Suppose the option has expiry date T and
a general payo¬ function which depends only on ST , the process at time T .
Ito™s lemma provides the ability to translate an a relation governing the
di¬erential dSt into a relation governing the di¬erential of the process dV (St , t).
In this sense it is the stochastic calculus analogue of the chain rule in ordinary
calculus. It is one of the most important single results of the twentieth century
in ¬nance and in science. The stochastic calculus and this mathematical result
concerning it underlies the research leading to 1997 Nobel Prize to Merton and
Scholes for their work on hedging in ¬nancial models. We saw one version of it
at the beginning of this section and here we provide a more general version.

Ito™s lemma.

Suppose St is a di¬usion process satisfying

dSt = a(St , t)dt + σ(St , t)dWt

and suppose V (St , t) is a smooth function of both arguments. Then V (St , t)
also satis¬es a di¬usion equation of the form

σ 2 (St , t) ‚ 2 V
‚V ‚V ‚V
dV = [a(St , t) + + ]dt + σ(St , t) dWt .
‚S 2
‚S 2 ‚t ‚S

Proof. The proof of this result is technical but the ideas behind it are

simple. Suppose we expand an increment of the process V (St , t) ( we write V

in place of V (St , t) omitting the arguments of the function and its derivatives.
We will sometimes do the same with the coe¬cients a and σ.)

1 ‚ 2V
‚V ‚V
(St+h ’ St )2 +
V (St+h , t + h) ≈ V + (St+h ’ St ) + (2.35)
‚S 2 ‚S ‚t

where we have ignored remainder terms that are o(h). Note that substituting
from 2.33 into 2.35, the increment (St+h ’ St ) is approximately normal with
mean a(St , t ) h and variance σ 2 (St , t ) h. Consider the term (St+h ’ St )2 .
Note that it is the square of the above normal random variable and has expected
value σ 2 (St , t)h + a2 (St , t)h2 . The variance of this random variable is O(h2 ) so
if we ignore all terms of order o(h) the increment V (St+h , t + h) ’ V (St , t) is
approximately normally distributed with mean

σ 2 (St , t) ‚ 2 V
‚V ‚V
[a(St , t ) + + ]h
‚S 2 ‚S ‚t

and standard deviation σ(St , t) ‚V h justifying (but not proving!) the relation


By Ito™s lemma, provided V is smooth, it also satis¬es a di¬usion equation of
the form 2.34. We should note that when V represents the price of an option,
some lack of smoothness in the function V is inevitable. For example for
a European call option with exercise price K, V (ST , T ) = max(ST ’ K, 0)
does not have a derivative with respect to ST at ST = K, the exercise price.
Fortunately, such exceptional points can be worked around in the argument,
since the derivative does exist at values of t < T.
The basic question in building a replicating portfolio is: for hedging pur-
poses, is it possible to ¬nd a self-¬nancing portfolio consisting only of the se-
curity and the bond which exactly replicates the option price process V (St , t)?
The self-¬nancing requirement is the analogue of the requirement that the net
cost of a portfolio is zero that we employed when we introduced the notion of

arbitrage. The portfolio is such that no funds are needed to be added to (or re-
moved from) the portfolio during its life, so for example any additional amounts
required to purchase equity is obtained by borrowing at the risk free rate. Sup-
pose the self-¬nancing portfolio has value at time t equal to Vt = ut St + wt Bt
where the (predictable) functions ut , wt represent the number of shares of stock
and bonds respectively owned at time t. Since the portfolio is assumed to be
self-¬nancing, all returns obtain from the changes in the value of the securities
and bonds held, i.e. it is assumed that dVt = ut dSt + wt dBt . Substituting from

dVt = ut dSt + wt dBt = [ut a(St , t) + wt rt Bt ]dt + ut σ(St , t)dWt

If Vt is to be exactly equal to the price V (St , t ) of an option, it follows on
comparing the coe¬cients of dt and dWt in 2.34 and 2.36, that ut = ‚S , called
the delta corresponding to delta hedging. Consequently,
Vt = St + wt Bt
and solving for wt we obtain:

1 ‚V
[V ’
wt = St ].
Bt ‚S
The conclusion is that it is possible to dynamically choose a trading strategy, i.e.
the weights wt , ut so that our portfolio of stocks and bonds perfectly replicates the
value of the option. If we own the option, then by shorting (selling) delta= ‚S

units of stock, we are perfectly hedged in the sense that our portfolio replicates
a risk-free bond. Surprisingly, in this ideal word of continuous processes and
continuous time trading commission-free trading, the perfect hedge is possible.
In the real world, it is said to exist only in a Japanese garden. The equation we
obtained by equating both coe¬cients in 2.34 and 2.36 is;
σ 2 (St , t) ‚ 2 V
‚V ‚V
’rt V + rt St (2.37)
+ + = 0.
‚S 2
‚S ‚t 2

Rewriting this allows an interpretation in terms of our hedged portfolio. If we
own an option and are short delta units of stock our net investment at time t
is given by (V ’ St ‚V ) where V = Vt = V (St , t). Our return over the next time

increment dt if the portfolio were liquidated and the identical amount invested
in a risk-free bond would be rt (Vt ’ St ‚V )dt. On the other hand if we keep this

hedged portfolio, the return over an increment of time dt is

‚V ‚V
d(V ’ St ) = dV ’ ( )dS
‚S ‚S
σ2 ‚ 2V
‚V ‚V ‚V
=( + +a )dt + σ dWt
2 ‚S 2
‚t ‚S ‚S
’ [adt + σdWt ]
σ2 ‚ 2V
=( + )dt
2 ‚S 2

σ 2 (St , t) ‚ 2 V
‚V ‚V
rt (V ’ St )= + .
‚S 2
‚S ‚t 2
The left side rt (V ’ St ‚V ) represents the amount made by the portion of our

portfolio devoted to risk-free bonds. The right hand side represents the return
on a hedged portfolio long one option and short delta stocks. Since these
investments are at least in theory identical, so is their return. This fundamental
equation is evidently satis¬ed by any option price process where the underlying
security satis¬es a di¬usion equation and the option value at expiry depends
only on the value of the security at that time. The type of option determines
the terminal conditions and usually uniquely determines the solution.
It is extraordinary that this equation in no way depends on the drift co-
e¬cient a(St , t). This is a remarkable feature of the arbitrage pricing theory.
Essentially, no matter what the drift term for the particular security is, in order
to avoid arbitrage, all securities and their derivatives are priced as if they had
as drift the spot interest rate. This is the e¬ect of calculating the expected values
under the martingale measure Q.
This PDE governs most derivative products, European call options, puts,

futures or forwards. However, the boundary conditions and hence the solution
depends on the particular derivative. The solution to such an equation is possi-
ble analytically in a few cases, while in many others, numerical techniques are
necessary. One special case of this equation deserves particular attention. In
the case of geometric Brownian motion, a(St , t) = µSt and σ(St , t) = σSt for
constants µ, σ. Assume that the spot interest rate is a constant rand that a
constant rate of dividends D0 is paid on the stock. In this case, the equation
specializes to

σ2 S 2 ‚ 2 V
‚V ‚V
’rV + + (r ’ D0 )S (2.38)
+ = 0.
2 ‚S 2
‚t ‚S
Note that we have not used any of the properties of the particular derivative
product yet, nor does this di¬erential equation involve the drift coe¬cient µ.
The assumption that there are no transaction costs is essential to this analysis,
as we have assumed that the portfolio is continually rebalanced.
We have now seen two derivations of parabolic partial di¬erential equations,
so-called because like the equation of a parabola, they are ¬rst order (derivatives)
in one variable (t) and second order in the other (x). Usually the solution of such
an equation requires reducing it to one of the most common partial di¬erential
equations, the heat or di¬usion equation, which models the di¬usion of heat
along a rod. This equation takes the form


u = k 2u
‚t ‚x
A solution of 2.39 with appropriate boundary conditions can sometime be found
by the separation of variables. We will later discuss in more detail the solution
of parabolic equations, both by analytic and numerical means. First, however,

when can we hope to ¬nd a solution of 2.39 of the form u(x, t) = g(x/ t).
By di¬erentiating and substituting above, we obtain an ordinary di¬erential
equation of the form

g 00 (ω) + ωg 0 (ω) = 0, ω = x/ t (2.40)

Let us solve this using MAPLE.
eqn := diff(g(w),w,w)+(w/(2*k))*diff(g(w),w)=0;

and because the derivative of the solution is slightly easier (for a statistician)
to identify than the solution itself,
> diff(%,w);

g(ω) = C2 exp{’w2 /4k} = C2 exp{’x2 /4kt} (2.41)
showing that a constant plus a constant multiple of the Normal (0, 2kt) cumu-
lative distribution function or
Z x
exp{’z 2 /4kt}dz
u(x, t) = C1 + C2 √ (2.42)
2 πkt ’∞

is a solution of this, the heat equation for t > 0. The role of the two constants is
simple. Clearly if a solution to 2.39 is found, then we may add a constant and/or
multiply by a constant to obtain another solution. The constant in general is
determined by initial and boundary conditions. Similarly the integral can be
removed with a change in the initial condition for if u solves 2.39 then so does
For example if we wish a solution for the half real x > 0 with initial condition
‚x .

u(x, 0) = 0, u(0, t) = 1 all t > 1, we may use
Z ∞
exp{’z 2 /4kt}dz, t > 0, x ≥ 0.
u(x, t) = 2P (N (0, 2kt) > x) = √
πkt x

Let us consider a basic solution to 2.39:

exp{’x2 /4kt}
u(x, t) = √ (2.43)
2 πkt

This connection between the heat equation and the normal distributions is fun-
damental and the wealth of solutions depending on the initial and boundary
conditions is considerable. We plot a fundamental solution of the equation as
follows with the plot in Figure 2.8:

Figure 2.8: Fundamental solution of the heat equation

>u(x,t) := (.5/sqrt(Pi*t))*exp(-x^2/(4*t));



As t ’ 0, the function approaches a spike at x = 0, usually referred to as
the “Dirac delta function” (although it is no function at all) and symbolically
representing the derivative of the “Heaviside function”. The Heaviside function
is de¬ned as H(x) = 1, x ≥ 0 and is otherwise 0 and is the cumulative distrib-
ution function of a point mass at 0. Suppose we are given an initial condition
of the form u(x, 0) = u0 (x). To this end, it is helpful to look at the solu-
tion u(x, t) and the initial condition u0 (x) as a distribution or measure (in this
case described by a density) over the space variable x. For example the density
u(x, t) corresponds to a measure for ¬xed t of the form νt (A) = A u(x, t)dx.
Note that the initial condition compatible with the above solution 2.42 can be
described somewhat clumsily as “u(x, 0) corresponds to a measure placing all
mass at x = x0 = 0 ”.In fact as t ’ 0, we have in some sense the following
convergence u(x, t) ’ δ(x) = dH(x), the Dirac delta function. We could just as
easily construct solve the heat equation with a more general initial condition of

the form u(x, 0) = dH(x ’ x0 ) for arbitrary x0 and the solution takes the form

exp{’(x ’ x0 )2 /4kt}.
u(x, t) = √ (1.22)
2 πkt

Indeed sums of such solutions over di¬erent values of x0 , or weighted sums, or
their limits, integrals will continue to be solutions to 2.39. In order to achieve
the initial condition u0 (x) we need only pick a suitable weight function. Note
u0 (z)dH(z ’ x)
u0 (x) =

Note that the function
Z ∞
exp{’(z ’ x)2 /4kt}u0 (z)dz
u(x, t) = √ (1.22)
2 πkt ’∞

solves 2.39 subject to the required boundary condition.

Solution of the Di¬usion Equation.

We now consider the general solution to the di¬usion equation of the form 2.37,
rewritten as
σ 2 (St , t) ‚ 2 V
‚V ‚V
= rt V ’ rt St ’ (2.44)
‚S 2
‚t ‚S 2
where St is an asset price driven by a di¬usion equation

dSt = a(St , t)dt + σ(St , t)dWt ,

V (St , t) is the price of an option on that asset at time t, and rt = r(t) is the
spot interest rate at time t. We assume that the price of the option at expiry
T is a known function of the asset price

V (ST , T ) = V0 (ST ).

Somewhat strangely, the option is priced using a related but not identical process
(or, equivalently, the same process under a di¬erent measure). Recall from the

backwards Kolmogorov equation 2.27 that if a related process Xt satis¬es the
stochastic di¬erential equation

dXt = r(Xt , t)Xt dt + σ(Xt , t)dWt

· z|Xt = s] satis¬es a partial
then its transition kernel p(t, s, T, z) = ‚z P [XT

di¬erential equation similar to 2.44;

‚p σ 2 (s, t) ‚ 2 p
= ’r(s, t)s ’ (2.48)
‚t ‚s 2

For a given process Xt this determines one solution. For simplicity, consider
the case (natural in ¬nance applications) when the spot interest rate is a function
of time, not of the asset price; r(s, t) = r(t). To obtain the solution so that
terminal conditions is satis¬ed, consider a product

f (t, s, T, z) = p(t, s, T, z)q(t, T )

q(t, T ) = exp{’ r(v)dv}

is the discount function or the price of a zero-coupon bond at time t which pays
1$ at maturity.
Let us try an application of one of the most common methods in solving
PDE™s, the “lucky guess” method. Consider a linear combination of terms of
the form 2.49 with weight function w(z). i.e. try a solution of the form
V (s, t) = p(t, s, T, z)q(t, T )w(z)dz

for suitable weight function w(z). In view of the de¬nition of pas a transition
probability density, this integral can be rewritten as a conditional expectation:

V (t, s) = E[w(XT )q(t, T )|Xt = s]

the discounted conditional expectation of the random variable w(XT ) given the
current state of the process, where the process is assumed to follow (2.18). Note

that in order to satisfy the terminal condition 2.46, we choose w(x) = V0 (x).
‚V ‚
= p(t, s, T, z)q(t, T )w(z)dz
‚t ‚t
‚p σ 2 (St , t) ‚ 2 p

= [’r(St , t)St 2]q(t, T )w(z)dz
‚s 2 ‚s
p(t, St , T, z)q(t, T )w(z)dz by 2.48
+ r(St , t)

σ 2 (St , t) ‚ 2 V
= ’r(St , t)St ’ + r(St , t)V (St , t)
‚S 2
‚S 2

where we have assumed that we can pass the derivatives under the integral
sign. Thus the process

V (t, s) = E[V0 (XT )q(t, T )|Xt = s]

satis¬es both the partial di¬erential equation 2.44 and the terminal conditions
2.46 and is hence the solution. Indeed it is the unique solution satisfying certain
regularity conditions. The result asserts that the value of any European option
is simply the conditional expected value of the discounted payo¬ (discounted to
the present) assuming that the distribution is that of the process 2.47. This
result is a special case when the spot interest rates are functions only of time of
the following more general theorem.

Theorem 13 ( Feynman-Kac)

Suppose the conditions for a unique solution to (2.44,2.46) (see for example
Du¬e, appendix E) are satis¬ed. Then the general solution to (2.15) under the
terminal condition 2.46 is given by
V (S, t) = E[V0 (XT )exp{’ r(Xv , v)dv}| Xt = S]

This represents the discounted return from the option under the distribution
of the process Xt . The distribution induced by the process Xt is referred to
as the equivalent martingale measure or risk neutral measure. Notice that when
the original process is a di¬usion, the equivalent martingale measure shares the
same di¬usion coe¬cient but has the drift replaced by r(Xt , t)Xt . The option
is priced as if the drift were the same as that of a risk-free bond i.e. as if the
instantaneous rate of return from the security if identical to that of bond. Of
course, in practice, it is not. A risk premium must be paid to the stock-holder
to compensate for the greater risk associated with the stock.

There are some cases in which the conditional expectation 2.53 can be deter-
mined explicitly. In general, these require that the process or a simple function
of the process is Gaussian.

For example, suppose that both r(t) and σ(t) are deterministic functions
of time only. Then we can solve the stochastic di¬erential equation (2.22) to

Xt σ(u)
XT = + dWu
q(t, T ) q(u, T )

The ¬rst term above is the conditional expected value of XT given Xt . The
second is the random component, and since it is a weighted sum of the normally
distributed increments of a Brownian motion with weights that are non-random,
it is also a normal random variable. The mean is 0 and the (conditional) vari-
R T 2 (u)
ance is t qσ(u,T ) du. Thus the conditional distribution of XT given Xt is normal
R T 2 (u)
with conditional expectation q(t,T ) and conditional variance t qσ(u,T ) du.

The special case of 2.53 of most common usage is the Black-Scholes model:
suppose that σ(S, t) = Sσ(t) for σ(t) some deterministic function of t. Then
the distribution of Xt is not Gaussian, but fortunately, its logarithm is. In this
case we say that the distribution of Xt is lognormal.

Lognormal Distribution

Suppose Z is a normal random variable with mean µ and variance σ 2 . Then we
say that the distribution of X = eZ is lognormal with mean · = exp{µ + σ 2 /2}
and volatility parameter σ. The lognormal probability density function with
mean · > 0 and volatility parameter σ > 0 is given by the probability density

√ exp{’(log x ’ log · ’ σ2 /2)2 /2σ 2 }. (2.55)
g(x|·, σ) =
xσ 2π
The solution to (2.18) with non-random functions σ(t), r(t) is now
(r(u) ’ σ 2 (u)/2)du + σ(u)dWu }. (2.56)
XT = Xt exp{
t t

Since the exponent is normal, the distribution of XT is lognormal with mean
log(Xt ) + t (r(u) ’ σ 2 (u)/2)du and variance t σ 2 (u)du. It follows that the
conditional distribution is lognormal with mean · = Xt q(t, T ) and volatility
parameter σ (u)du.

We now derive the well-known Black-Scholes formula as a special case of
2.53. For a call option with exercise price E, the payo¬ function is V0 (ST ) =
max(ST ’ E, 0). Now it is helpful to use the fact that for a standard normal
random variable Z and arbitrary σ > 0, ’∞ < µ < ∞ we have the expected
value of max(eσZ+µ , 0) is

µ µ
eµ+σ /2
+ σ) ’ ¦( ) (2.57)
σ σ

where ¦(.) denotes the standard normal cumulative distribution function. As
a result, in the special case that r and σ are constants, (2.53) results in the
famous Black-Scholes formula which can be written in the form

V (S, t) = S¦(d1 ) ’ Ee’r(T ’t) ¦(d2 ) (2.58)


log(S/E) + (r + σ 2 /2)(T ’ t)
√ , d2 = d1 ’ σ T ’ t
d1 =
σ T ’t

are the values ±σ2 (T ’ t)/2 standardized by adding log(S/E) + r(T ’ t) and

dividing by σ T ’ t. This may be derived by the following device; Assume (i.e.
pretend) that, given current information, the distribution of S(T ) at expiry is
lognormally distributed with the mean · = S(t)er(T ’t) .
The mean of the log-normal in the risk neutral world S(t)er(T ’t) is exactly
the future value of our current stocks S(t) if we were to sell the stock and invest
the cash in a bank deposit. Then the future value of an option with payo¬
function given by V0 (ST ) is the expected value of this function against this
lognormal probability density function, then discounted to present value

’r(T ’t)
V0 (x)g(x|S(t)er(T ’t) , σ T ’ t)dx. (2.59)

Notice that the Black-Scholes derivation covers any di¬usion process govern-
ing the underlying asset which is driven by a stochastic di¬erential equation of
the form
dS = a(S)dt + σSdWt

regardless of the nature of the drift term a(S). For example a non-linear function
a(S) can lead to distributions that are not lognormal and yet the option price
is determined as if it were.

Example: Pricing Call and Put options.

Consider pricing an index option on the S&P 500 index an January 11, 2000 (the
index SPX closed at 1432.25 on this day). The option SXZ AE-A is a January
call option with strike price 1425. The option matures (as do equity options in
general) on the third Friday of the month or January 21, a total of 7 trading
days later. Suppose we wish to price such an option using the Black-Scholes
model. In this case, T ’ t measured in years is 7/252 = 0.027778. The annual
volatility of the Standard and Poor 500 index is around 19.5 percent or 0.195
and assume the very short term interest rates approximately 3%. In Matlab we
can value this option using

[CALL,PUT] = BLSPRICE(1432.25,1425,0.03,7/252,0.195,0)
CALL = 23.0381
PUT = 14.6011
Arguments of the function BLSPRICE are, in order, the current equity price,
the strike price, the annual interest rate r, the time to maturity T ’ t in years,
the annual volatility σ and the last argument is the dividend yield in percent
which we assumed 0. Thus the Black-Scholes price for a call option on SPX
is around 23.03. Indeed this call option did sell on Jan 11 for $23.00. and
the put option for $14 5/8. From the put call parity relation (see for example
Wilmott, Howison, Dewynne, page 41) S + P ’ C = Ee’r(T ’t) or in this
case 1432.25 + 14.625 ’ 23 = 1425e’r(7/252) . We might solve this relation to
obtain the spot interest rate r. In order to con¬rm that a di¬erent interest rate
might apply over a longer term, we consider the September call and put options
(SXZ) on the same day with exercise price 1400 which sold for $152 and 71$
respectively. In this case there are171 trading days to expiry and so we need to
solve 1432.25 + 71 ’ 152 = 1400e’r(171/252) , whose solution is r = 0.0522 .
This is close to the six month interest rates at the time, but 3% is low for the
very short term rates. The discrepancy with the actual interest rates is one of
several modest failures of the Black-Scholes model to be discussed further later.
The low implied interest rate is in¬‚uenced by the cost of handling and executing
an option, which are non-negligible fractions of the option prices, particularly
with short term options such as this one. An analogous function to the Matlab
function above which provides the Black-Scholes price in Splus or R is given



1. It is common for a stock whose price has reached a high level to split or
issue shares on a two-for-one or three-for-one basis. What is the e¬ect of
a stock split on the price of an option?

2. If a stock issues a dividend of exactly D (known in advance) on a certain
date, provide a no-arbitrage argument for the change in price of the stock
at this date. Is there a di¬erence between deterministic D and the case
when D is a random variable with known distribution but whose value is
declared on the dividend date?

3. Suppose Σ is a positive de¬nite covariance matrix and · a column vector.
Show that the set of all possible pairs of standard deviation and mean
√ P
return ( wT Σw, · T w) for weight vector w such that i wi = 1 is a
convex region with a hyperbolic boundary.

4. The current rate of interest is 5% per annum and you are o¬ered a random
bond which pays either $210 or $0 in one year. You believe that the
probability of the bond paying $210 is one half. How much would you
pay now for such a bond? Suppose this bond is publicly traded and a
large fraction of the population is risk averse so that it is selling now for
$80. Does your price o¬er an arbitrage to another trader? What is the
risk-neutral measure for this bond?

5. Which would you prefer, a gift of $100 or a 50-50 chance of making $200?
A ¬ne of $100 or a 50-50 chance of losing $200? Are your preferences
self-consistent and consistent with the principle that individuals are risk-

6. Compute the stochastic di¬erential dXt (assuming Wt is a Wiener process)

(a) Xt = exp(rt)
(b) Xt = 0 h(t)dWt

(c) Xt = X0 exp{at + bWt }

(d) Xt = exp(Yt ) where dYt = µdt + σdWt .

7. Show that if Xt is a geometric Brownian motion, so is Xt for any real
number β.

8. Suppose a stock price follows a geometric Brownian motion process

dSt = µSt dt + σSt dWt

Find the di¬usion equation satis¬ed by the processes (a) f (St ) = St ,(b)
log(St ), (c) 1/St . Find a combination of the processes St and 1/St that
does not depend on the drift parameter µ. How does this allow constructing
estimators of σ that do not require knowledge of the value of µ?

9. Consider an Ito process of the form

dSt = a(St )dt + σ(St )dWt

Is it possible to ¬nd a function f (St ) which is also an Ito process but with
zero drift?

10. Consider an Ito process of the form

dSt = a(St )dt + σ(St )dWt

Is it possible to ¬nd a function f (St ) which has constant di¬usion term?
g(t)dWt ≈
11. Consider approximating an integral of the form g(t){W (t+

h) ’ W (t)} where g(t) is a non-random function and the sum is over val-
ues of t = nh, n = 0, 1, 2, ...T /h ’ 1. Show by considering the distribution

of the sum and taking limits that the random variable g(t)dWt has a

normal distribution and ¬nd its mean and variance.

12. Consider two geometric Brownian motion processes Xt and Yt both driven
by the same Wiener process

dXt = aXt dt + bXt dWt

dYt = µYt dt + σYt dWt .

Derive a stochastic di¬erential equation for the ratio Zt = Xt /Yt . Suppose
for example that Xt models the price of a commodity in $C and Yt is the
exchange rate ($C/$U S) at time t. Then what is the process Zt ? Repeat
in the more realistic situation in which

dXt = aXt dt + bXt dWt
dYt = µYt dt + σYt dWt

(1) (2)
and Wt , Wt are correlated Brownian motion processes with correlation

13. Prove the Shannon inequality that
X qi
H(Q, P ) = qi log(
for any probability distributions P and Q with equality if and only if
all pi = qi .

14. Consider solving the problem
X qi
min H(Q, P ) = qi log( )
subject to the constraints i qi = 1 and EQ f (X) = qi f (i) = µ. Show
that the solution, if it exists, is given by
exp(·f (i))
qi = pi

P m0 (·)
where m(·) = i pi exp(·f (i))] and · is chosen so that = µ. (This

shows that the closest distribution to P which satis¬es the constraint is
obtained by a simple “exponential tilt” or Esscher transform so that dP (x)

is proportional to exp(·f (x)) for a suitable parameter ·).

15. Let Q— minimize H(Q, P ) subject to a constraint

EQ g(X) = c.

Let Q be some other probability distribution satisfying the same con-
straint. Then prove that

H(Q, P ) = H(Q, Q— ) + H(Q— , P ).

16. Let I1 , I2 ,... be a set of constraints of the form

EQ gi (X) = ci

and suppose we de¬ne Pn as the solution of

max H(P )

subject to the constraints I1 © I2 © ...In . Then prove that

— — — — — — — —
H(Pn , P1 ) = H(Pn , Pn’1 ) + H(Pn’1 , Pn’2 ) + ... + H(P2 , P1 ).

17. Consider a defaultable bond which pays a fraction of its face value F p
on maturity in the event of default. Suppose the risk free interest rate
continuously compounded is r so that Bs = exp(sr). Suppose also that a
constant coupon $d is paid at the end of every period s = t + 1, ..., T ’ 1.
Then show that the value of this bond at time t is

exp{’(r + k)} ’ exp{’(r + k){T ’ t)}
Pt = d
1 ’ exp{’(r + k)}
+ pF exp{’r(T ’ t)} + (1 ’ p)F exp{’(r + k)(T ’ t)}

18. (a) Show that entropy is always positive and if Y = g(X) is a function
of X then Y has smaller entropy than X, i.e. H(pY ) · H(pX ).

(b) Show that if X has any discrete distribution over n values, then its
entropy is · log(n).
Chapter 3

Basic Monte Carlo Methods

Consider as an example the following very simple problem. We wish to price
a European call option with exercise price $22 and payo¬ function V (ST ) =
(ST ’22)+ . Assume for the present that the interest rate is 0% and ST can take
only the following ¬ve values with corresponding risk neutral (Q) probabilities

s 20 21 22 23 24
Q[ST = s] 1/16 4/16 6/16 4/16 1/16
In this case, since the distribution is very simple, we can price the call option

4 1 3
EQ V (ST ) = EQ (ST ’ 22)+ = (23 ’ 22) + (24 ’ 22) =.
16 16 8

However, the ability to value an option explicitly is a rare luxury. An alternative
would be to generate a large number (say n = 1000) independent simulations of
the stock price ST under the measure Q and average the returns from the option.
Say the simulations yielded values for ST of 22, 20, 23, 21, 22, 23, 20, 24, .... then



. 3
( 11)