. 17
( 17)

Diffusion models are a standard way to represent random variables in continuous time.
The ideas are analogous to the handling of discrete-time stochastic processes. We start with
a simple shock series, µt in discrete time and dzt in continuous time. Then we build up more
complex models by building on this foundation.
The basic building block is a Brownian motion which is the natural generalization of a
random walk in discrete time. For a random walk

zt ’ zt’1 = µt

the variance scales with time; var(zt+2 ’ zt ) = 2var(zt+1 ’ zt ). Thus, de¬ne a Brownian
motion as a process zt for which

zt+∆ ’ zt ∼ N (0, ∆).

We have added the normal distribution to the usual de¬nition of a random walk. As E(µt µt’1 ) =
0 in discrete time, increments to z for non-overlapping intervals are also independent. I use
the notation zt to denote z as a function of time, in conformity with discrete time formulas;
many people prefer to use the standard representation of a function z(t).
It™s natural to want to look at very small time intervals. We use the notation dzt to rep-
resent zt+∆ ’ zt for arbitrarily small time intervals ∆, and we sometimes drop the subscript
when it™s obvious we™re talking about time t. Conversely, the level of zt is the sum of its


small differences, so we can write the stochastic integral
Z t
dzs .
zt ’ z0 =

The variance of a random walk scales with time, so the standard deviation scales with the
square root of time. The standard deviation is the “typical size” of a movement in a√ normally
distributed random variable, so the “typical size” of zt+∆ ’ zt in time interval ∆ is ∆. This

fact means that (zt+∆ ’ zt ) /∆ has typical size 1/ ∆, so though the sample path of zt is
continuous, zt is not differentiable.
For this reason, it™s important to be a little careful with notation. dz, dzt or dz(t) mean
zt+∆ ’ zt for arbitrarily small ∆. We are used to thinking about dz as the derivative of a
function, but since a Brownian motion is not a differentiable function of time, dz = dz(t) dt
makes no sense.
From (23.358), it™s clear that

Et (dzt ) = 0.

Again, the notation is initially confusing “ how can you take an expectation at t of a random
variable dated t? Keep in mind, however that dzt = zt+∆ ’ zt is the forward difference. The
variance is thus the same as the second moment, so we write it as
¡ 2¢
Et dzt = dt.

It turns out that not only is the variance of dzt equal to dt, but
dzt = dt

for every sample path of zt . z 2 is a differentiable function of time, though z itself is not.
We can see this with the same sort of argument I used for zt itself. If x ∼ N (0, σ2 ), then
var(x2 ) = 2σ 4 . Thus,
£ ¤
var (zt+∆ ’ zt )2 = 2∆4 .

The mean of (zt+∆ ’ zt )2 is ∆, while the standard deviation of (zt+∆ ’ zt )2 is 2∆2 . As
∆ shrinks, the ratio of standard deviation to mean shrinks to zero; i.e. the series becomes

23.2 Diffusion model


I form more complicated time series processes by adding drift and diffusion terms,
dxt = µ(·)dt + σ(·)dzt
I introduce some common examples,
Random walk with drift: dxt = µdt + σdzt .
AR(1) dxt = ’φ(x ’ µ) dt + σdzt

Square root process dxt = ’φ(x ’ µ) dt + σ xt dzt
Price process pt = µdt + σdzt .

You can simulate a diffusion process by approximating it for a small time interval,

xt+∆ ’ xt = µ(·)∆t + σ(·) ∆t µt+∆ ; µt+∆ ∼ N (0, 1).

As we add up serially uncorrelated shocks µt to form discrete time ARMA models, we
build on the shocks dzt to form diffusion models. I proceed by example, introducing some
popular examples in turn.
Random walk with drift. In discrete time, we model a random walk with drift as
xt = µ + xt’1 + µt
The obvious continuous time analogue is
dxt = µdt + σdzt .
It™s easy to ¬gure out the implications of this process for discrete horizons,
xt = x0 + µt + σ(zt ’ z0 )
xt = x0 + µt + µt ; µt ˜N (0, σ2 t).
This is a random walk with drift.
AR(1). The simplest discrete time process is an AR(1),
xt = (1 ’ ρ)µ + ρxt’1 + µt
xt ’ xt’1 = ’(1 ’ ρ)(xt’1 ’ µ) + µt

The continuous time analogue is
dxt = ’φ(xt ’ µ) dt + σdzt .


This is known as the Ohrnstein-Uhlenbeck process. The mean or drift is

Et (dxt ) = ’φ(xt ’ µ)dt.

This force pulls x back to its steady state value µ, but the shocks σdzt move it around.
Square root process. Like its discrete time counterpart, the continuous time AR(1) ranges
over the whole real numbers. It would be nice to have a process that was always positive, so
it could capture a price or an interest rate. An extension of the continuous time AR(1) is a
workhorse of such applications,

dxt = ’φ(xt ’ µ) dt + σ xt dzt .

Now, volatility also varies over time,

Et (dx2 ) = σ2 xt dt

as x approaches zero, the volatility declines. At x = 0, the volatility is entirely turned off, so
x drifts up to µ. We will show more formally below that this behavior keeps x ≥ 0 always.

This is a nice example because it is decidedly nonlinear. Its discrete time analogue

xt = (1 ’ ρ)µ + ρxt’1 + xt µt

is not a standard ARMA model, so standard linear time series tools would fail us. We could
not, for example, give a pretty equation for the distribution of xt+s for ¬nite s. It turns out
that we can do this in continuous time. Thus, one advantage of continuous time formulations
is that they give rise to a toolkit of interesting nonlinear time series models for which we have
closed form solutions.
Price processes. A modi¬cation of the random walk with drift is the most common model
for prices. We want the return or proportional increase in price to be uncorrelated over time.
The most natural way to do this is to specify

dpt = pt µdt + pt σdzt

or more simply
= µdt + σdzt .

Diffusion models more generally. A general picture should emerge. We form more com-
plex models of stochastic time series by changing the local mean and variance of the under-
lying Brownian motion.

dxt = µ(xt )dt + σ(xt )dzt

More generally, we can allow the drift µ and diffusion to be a function of other variables and


of time explicitly. We often write

dxt = µ(·)dt + σ(·)dzt

to remind us of such possible dependence. There is nothing mysterious about this class of
processes; they are just like easily understandable discrete time processes

xt+∆ ’ xt = µ(·)∆t + σ(·) ∆t µt+∆ ; µt+∆ ∼ N (0, 1).

In fact, when analytical methods fail us, we can ¬gure out how diffusion models work by
simulating the discretized version (23.359) for a ¬ne time interval ∆.
The local mean of a diffusion model is

Et (dxt ) = µ(·)dt

and the local variance is

dx2 = Et (dx2 ) = σ2 (·)dt
t t

Variance is equal to second moment because means scale linearly with time interval ∆, so
mean squared scales with ∆2 , while the second moment scales with ∆.
Stochastic integrals. For many purposes, simply understanding the differential represen-
tation of a process is suf¬cient. However, we often want to understand the random variable
xt at longer horizons. For example, we might want to know the distribution of xt+s given
information at time t.
Conceptually, what we want to do is to think of a diffusion model as a stochastic differ-
ential equation and solve it forward through time to obtain the ¬nite-time random variable
xt+s . Putting some arguments in for µ and σ for concreteness, we can think of evaluating the
Zt Zt Zt
xt ’ x0 = dxs = µ(xs , s, ..)ds + σ(xs , s, ..)dzs .
0 0 0
We have already seen how zt = z0 + 0 dzs generates the random variable zt ∼ N (0, t),
so you can see how expressions like this one generate random variables xt . The objective of
solving a stochastic differential equation is thus to ¬nd the distribution of x at some future
date, or at least some characterizations of that distribution such as conditional mean, variance
etc. Some authors dislike the differential characterization and always write processes in terms
of stochastic integrals. I return to how one might solve an integral of this sort below.

23.3 Ito™s lemma


Do second order Taylor expansions, keep only dz, dt,and dz 2 = dt terms.
= f 0 (x)dx + f 00 (x)dx2
µ ¶
1 00
f 0 (x)µx + f (x)σ 2 dt + f 0 (x)σ x dz.
dy = x

You often have a diffusion representation for one variable, say

dxt = µx (·)dt + σ x (·)dzt .

Then you de¬ne a new variable in terms of the old one,

yt = f (xt ).

Naturally, you want a diffusion representation for yt . Ito™s lemma tells you how to get it. It

Use a second order Taylor expansion, and think of dz as dt; thus as ∆t ’ 0 keep
terms dz, dt, and dz 2 = dt, but terms dtdz, dt2 and higher go to zero.

Applying these rules to (23.360), start with the second order expansion

1 d2 f(x) 2
df (x)
dy = dx + dx
2 dx2
Expanding the second term,

dx2 = [µx dt + σx dz]2 = µ2 dt2 + σ 2 dz 2 + 2µx σ x dtdz.
x x

Now apply the rule dt2 = 0, dz 2 = dt and dtdz = 0. Thus,

dx2 = σ2 dt

Substituting for dx and dx2 ,

1 d2 f (x) 2
dy = (µx dt + σx dz) + σ dt
2¶ dx2 x
1 d2 f (x) 2
df(x) df (x)
= µx + σx dt + σx dz
dx 2 dx dx

Thus, Ito™s lemma.
µ ¶
1 d2 f(x) 2
df (x) df (x)
dy = µx (·) + σ x (·) dt + σx (·)dz
2 dx2
dx dx


The surprise here is the second term in the drift. Intuitively, this term captures a “Jensen™s
inequality” effect. If a is a mean zero random variable and b = a2 = f (a), then the mean of
b is higher than the mean of a. The more variance of a, and the more concave the function,
the higher the mean of b.

23.4 Problems

1. Find the diffusion followed by the log price,
y = ln(p).
2. Find the diffusion followed by xy.
3. Suppose y = f (x, t) Find the diffusion representation for y. (Follow the obvious
multivariate extension of Ito™s lemma.)
4. Suppose y = f(x, w), with both x, w diffusions. Find the diffusion representation for y.
Denote the correlation between dzx and dzw by ρ.



. 17
( 17)