Assuming γ 6= 1, the inverse function is

g(y) = cy 1/(1’γ)

1’γ

for constant c and the process Yt = (1 ’ γ)’1 St satis¬es an Ito equation

with constant di¬usion coe¬cient;

r 1’γ 1 γ’1

dYt = { St ’ γσSt }dt + dWt

σ 2

r γσ

dYt = { (1 ’ γ)Yt ’ }dt + dWt . (3.43)

2(1 ’ γ)Yt

σ

After simulating the process Yt we invert the relation to obtain St = ((1 ’

γ)Yt )1/(1’γ) . There is one ¬ne point related to simulating the process (3.43)

that we implemented in the code below. The equation (3.40) is a model for

a non-negative asset price St but when we simulate the values Yt from (3.43)

there is nothing to prevent the process from going negative. Generally if γ ≥ 1/2

SIMULATING STOCHASTIC PARTIAL DIFFERENTIAL EQUATIONS.195

and if we increment time in su¬ciently small steps ∆t, then it is unlikely that

a negative value of Yt will obtain, but when it does, we assume absorption at 0

(analogous to default or bankruptcy). The following Matlab function was used

to simulate sample paths from the CEV process over the interval [0, T ].

function s=simcev(n,r,sigma,So,T,gam)

% simulates n sample paths of a CEV process on the interval [0,T] all with

% the same starting value So. assume gamma != 1.

Yt=ones(n,1)*(So^(1-gam))/(1-gam); y=Yt;

dt=T/1000; c1=r*(1-gam)/sigma; c2=gam*sigma/(2*(1-gam));

dw=normrnd(0,sqrt(dt),n,1000);

for i=1:1000

v=find(Yt); % selects positive components of Yt for update

Yt=max(0,Yt(v)+(c1.*Yt(v)-c2./Yt(v))*dt+dw(v,i));

y=[y Yt];

end

s=((1-gam)*max(y,0)).^(1/(1-gam)); %transforms to St

For example when r = .05, σ = .2, ∆t = .00025, T = .25, γ = 0.8 we can

generate 1000 sample paths with the command

s=simcev(1000,.05,.2,10,.25,.8);

In order to estimate the price of a barrier option with a down-and-out barrier

at b and exercise price K, capture the last column of s,

ST=s(:,1001);

then value a European call option based on these sample paths

v=exp(-r*T)*max(ST-K,0);

196 CHAPTER 3. BASIC MONTE CARLO METHODS

¬nally setting the values equal to zero for those paths which breached the

lower barrier and then averaging the return from these 1000 replications;

v(min(s™)<=9)=0;

mean(v);

which results in an estimated value for the call option of around $0.86. Al-

though the standard error is still quite large (0.06), we can compare this with the

Black-Scholes price with similar parameters. [CALL,PUT] = BLSPRICE(10,10,.05,.25,.2,0)

which gives a call option price of $0.4615. Why such a considerable di¬erence?

Clearly the down-and-out barrier can only reduce the value of a call option.

Indeed if we remove the down-and-out feature, the European option is valued

closer to $1.28 so the increase must be due to the di¬erences betwen the CEV

process and the geometric Brownian motion. We can con¬rm this by simulating

the value of a barrier option in the Black_Scholes model later on.

Problems

1. Consider the mixed generator xn = (axn’1 + 1)mod(m) with m = 64.

What values of a results in the maximum possible period. Can you indicate

which generators appears more and less random?

2. Consider a shu¬„ed generator described in Section 3.2 with k = 3, m1 =

7, m2 = 11.

Determine the period of the shu¬„ed random number generator above and

compare with the periods of the two constituent generators.

3. Consider the quadratic residue generator xn+1 = x2 mod m with m =

n

4783 — 4027. Write a program to generate pseudo-random numbers from

this generator. Use this to determine the period of the generator starting

with seed x0 = 196, and with seed x0 = 400.

PROBLEMS 197

4. Consider a sequence of independent U [0, 1] random variables U1 , ..., Un .

De¬ne indicator random variables

Si = 1 if Ui’1 < Ui and Ui > Ui+1 for i = 2, 3, ..., n ’ 1, otherwise Si = 0,

Ti = 1 if Ui’1 > Ui and Ui < Ui+1 for i = 2, 3, ..., n ’ 1, otherwise Ti = 0.

Verify the following:

(a)

X

R=1+ (Si + Ti )

(b)

2n ’ 1

1

and E(R) =

E(Ti ) = E(Si ) =

3 3

(c) cov(Ti , Tj ) = cov(Si , Sj ) = ’ 1 if |i ’ j| = 1 and it equals 0 if |i ’ j| >

9

1.

§

⎪ 5 1 7

’ |i ’ j| = 1

⎪ if

=

⎪ 24 9 72

⎨

’1

(d) cov(Si , Tj ) = .

if i=j

⎪9

⎪

⎪

©0 |i ’ j| > 1

if

(e) var(R) = 2(n’2) 1 ( 2 )+4(n’3)(’ 1 )+4(n’3)( 72 )+2(n’2)(’ 1 ) =

7

33 9 9

3n’5

18 .

(f) Con¬rm these formulae for mean and variance of R in the case n =

3, 4.

5. Generate 1000 daily “returns” Xi , i = 1, 2, ..., 1000 from each of the two

distributions, the Cauchy and the logistic. Choose the parameters so that

the median is zero and P [|Xi | < .06] = .95. Graph the total return over an

n day period versus n. Is there a qualitative di¬erence in the two graphs?

Repeat with a graph of the daily return averaged over days 1, 2, ..., n.

6. Consider the linear congruential generator

xn+1 = (axn + c) mod 28

198 CHAPTER 3. BASIC MONTE CARLO METHODS

What is the maximal period that this generator can achieve when c = 1

and for what values of a does this seem to be achieved? Repeat when

c = 0.

7. Let U be a uniform random variable on the interval [0,1]. Find a function

of U which is uniformly distributed on the interval [0,2]. Repeat for the

interval [a, b].

8. Evaluate the following integral by simulation:

Z2

x3/4 (4 ’ x)1/3 dx.

0

9. Evaluate the following integral by simulation:

Z∞

4

e’x dx.

’∞

R∞ 4

e’x dx

(Hint: Rewrite this integral in the form 2 and then change

0

variables to y = x/(1 + x))

10. Evaluate the following integral by simulation:

Z 1Z 1

4

e(x+y) dxdy.

0 0

(Hint: Note that if U1 , U2 are independent Uniform[0,1] random variables,

R1R1

E[g(U1 , U2 )] = 0 0 g(x, y)dxdy for any function g).

11. Find the covariance cov(eU , e’U ) by simulation where U is uniform[0,1]

and compare the simulated value to the true value. Compare the actual

error with the standard error of your estimator.

12. For independent uniform random numbers U1 , U2,.... de¬ne the random

P

variable N = min{n; n Ui > 1}.

i=1

Estimate E(N ) by simulation. Repeat for larger and larger numbers of

simulations. Guess on the basis of these simulations what is the value of

E(N ). Can you prove your hypothesis concerning the value of E(N )?

PROBLEMS 199

13. Give an algorithm for generating observations from a distribution which

x+x3 +x5

has cumulative distribution function F (x) = < x < 1. Record

,0

3

the time necessary to generate the sample mean of 100,000 random vari-

ables with this distribution. (Hint: Suppose we generate X1 with cumu-

lative distribution function F1 (x) and X2 with cumulative distribution

function F2 (x) , X3 with cumulative distribution function F3 (x) We then

generate J = 1, 2, or 3 such that P [J = j] = pj and output the value

XJ . What is the cumulative distribution function of the random variable

output?)

14. Consider independent random variables Xi i = 1, 2, 3 with cumulative

distribution function

§

⎪ x3 ,

⎪ i=1

⎪

⎨

ex ’1

Fi (x) = i=2

⎪ e’1

⎪

⎪

© xex’1 , i=3

for 0 < x < 1. Explain how to obtain random variables with cumulative

distribution function G(x) = Π3 Fi (x) and G(X) = 1’Π3 (1’Fi (x)).

i=1 i=1

(Hint: consider the cumulative distribution function of the minimum and

maximum).

15. Suppose we wish to estimate a random variable X having cumulative

distribution function F (x) using the inverse transform theorem, but the

exact cumulative distribution function is not available. We do, however,

b b

have an unbiased estimator F (x) of F (x) so that 0 · F (x) · 1 and E

b

F (x) = F (x) for all x. Show that provided the uniform variate U is

b b

independent of F (x), the random variable X = F ’1 (U ) has cumulative

distribution function F (x).

200 CHAPTER 3. BASIC MONTE CARLO METHODS

16. Develop an algorithm for generating variates from the density:

√ 2 2 2

f (x) = 2/ πe2a’x ’a /x , x > 0

17. Develop an algorithm for generating variates from the density:

2

, for ’ ∞ < x < ∞

f (x) =

eπx + e’πx

18. Obtain generators for the following distributions:

(a) Rayleigh

x ’x2 /2σ2

,x ≥ 0 (3.44)

f (x) = e

σ2

(b) Triangular

2 x

(1 ’ ), 0 · x · a (3.45)

f (x) =

a a

√

X2 + Y 2

19. Show that if (X, Y ) are independent standard normal variates, then

has the distribution of the square root of a chi-squared(2) (i.e. exponen-

tial(2)) variable and arctan(Y /X) is uniform on [0, 2π].

20. Generate the pair of random variables (X, Y )

(3.46)

(X, Y ) = R(cos˜, sin˜)

where we use a random number generator with poor lattice properties such

as the generator xn+1 = (383xn +263) mod 10000 to generate our uniform

random numbers. Use this generator together with the Box-Mueller al-

gorithm to generate 5,000 pairs of independent random normal numbers.

Plot the results. Do they appear independent?

21. (Log-normal generator ) Describe an algorithm for generating log-normal

random variables with probability density function given by

1

√ exp{’(logx ’ log· + σ 2 /2)2 /2σ 2 }. (3.47)

g(x|·, σ) =

xσ 2π

PROBLEMS 201

22. (Multivariate Normal generator ) Suppose we want to generate a mul-

tivariate normal random vector (X1 , X2, ..., XN ) having mean vector

(µ1 , ..., µN ) and covariance matrix the N — N matrix Σ. The usual pro-

cedure involves a decomposition of Σ into factors such that A0 A = Σ. For

example, A could be determined from the Cholesky decomposition, in Mat-

lab, A=chol(sigma), or in R, A= chol(sigma, pivot = FALSE, LINPACK

= pivot) which provides such a matrix A which is also upper triangular,

in the case that Σ is positive de¬nite. Show that if Z = (Z1 , ..., ZN ) is a

vector of independent standard normal random variables then the vector

X = (µ1 , ..., µN ) + ZA has the desired distribution.

23. (Euler vs. Milstein Approximation) Use the Milstein approximation with

step size .001 to simulate a geometric Brownian motion of the form

dSt = .07St dt + .2St dWt

Compare both the Euler and the Milstein approximations using di¬erent

step sizes, say ∆t = 0.01, 0.02, 0.05, 0.1 and use each approximation to

price an at-the-money call option assuming S0 = 50 and expiry at T =

0.5. How do the two methods compare both for accurately pricing the call

option and for the amount of computing time required?

24. Suppose interest rates follow the constant elasticity of variance process of

the form

drt = k(b ’ rt ) + σ|rt | γ dWt

for parameters value γ, b, k > 0. For various values of the parameters k, γ

and for b = 0.04 use both Euler and Milsten to generate paths from this

process. Draw conclusions about the following:

(a) When does the marginal distribution of rt appear to approach a

steady state solution. Plot the histogram of this steady state dis-

tribution.

202 CHAPTER 3. BASIC MONTE CARLO METHODS

(b) Are there simulations that result in a negative value of r? How do

you rectify this problem?

(c) What does the parameter σ represent? Is it the annual volatility of

the process?

25. Consider a sequence of independent random numbers X1 , X2 , ...with a

continuous distribution and let M be the ¬rst one that is less than its

predecessor:

M = min{n; X1 · X2 · ... · Xn’1 > Xn }

P∞

(a) Use the identity E(M ) = P [M > n} to show E(M ) = e.

n=0

(b) Use 100,000 simulation runs and part a to estimate e with a 95%

con¬dence interval.

(c) How many simulations are required if you wish to estimate e within

0.005 (using a 95% con¬dence interval)?

Chapter 4

Variance Reduction

Techniques

Introduction

In this chapter we discuss techniques for improving on the speed and e¬ciency

of a simulation, usually called “variance reduction techniques”.

Much of the simulation literature concerns discrete event simulations (DES),

simulations of systems that are assumed to change instantaneously in response

to sudden or discrete events. These are the most common in operations research

and examples are simulations of processes such as networks or queues. Simula-

tion models in which the process is characterized by a state, with changes only

at discrete time points are DES. In modeling an inventory system, for example,

the arrival of a batch of raw materials can be considered as an event which pre-

cipitates a sudden change in the state of the system, followed by a demand some

discrete time later when the state of the system changes again. A system driven

by di¬erential equations in continuous time is an example of a DES because

the changes occur continuously in time. One approach to DES is future event

203

204 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

simulation which schedules one or more future events at a time, choosing the

event in the future event set which has minimum time, updating the state of

the system and the clock accordingly, and then repeating this whole procedure.

A stock price which moves by discrete amounts may be considered a DES. In

fact this approach is often used in valuing American options by Monte Carlo

methods with binomial or trinomial trees.

Often we identify one or more performance measures by which the system

is to be judged, and parameters which may be adjusted to improve the system

performance. Examples are the delay for an air tra¬c control system, customer

waiting times for a bank teller scheduling system, delays or throughput for

computer networks, response times for the location of ¬re stations or supply

depots, etc. Performance measures again are important in engineering examples

or in operations research, but less common in ¬nance. They may be used to

calibrate a simulation model, however. For example our performance measure

might be the average distance between observed option prices on a given stock

and prices obtained by simulation from given model parameters. In all cases,

the performance measure is usually the expected value of a complicated function

of many variables, often expressible only by a computer program with some

simulated random variables as input. Whether these input random variables are

generated by inverse transform, or acceptance-rejection or some other method,

they are ultimately a function of uniform[0,1] random variables U1 , U2 , .... These

uniform random variables determine such quantities as the normally distributed

increments of the logarithm of the stock price. In summary, the simulation is

used simply to estimate a multidimensional integral of the form

ZZ Z

(4.1)

E(g(U1 , ..., Ud )) = .. g(u1 , u2 , ...ud )du1 du2 . . . dud

over the unit cube in d dimensions where often d is large.

As an example in ¬nance, suppose that we wish to price a European option

on a stock price under the following stochastic volatility model.

INTRODUCTION 205

Example 33 Suppose the daily asset returns under a risk-neutral distribution

is assumed to be a variance mixture of the Normal distribution, by which we

mean that the variance itself is random, independent of the normal variable and

follows a distribution with moment generating function s(s). More speci¬cally

assume under the Q measure that the stock price at time n∆t is determined

from

exp{r∆t + σn+1 Zn+1 }

S(n+1)∆t = Sn∆t

m( 1 )

2

2

where, under the risk-neutral distribution, the positive random variables σi

are assumed to have a distribution with moment generating function m(s) =

2 2

E{exp(sσi )}, Zi is standard normal independent of σi and both (Zi , σi ) are

independent of the process up to time n∆t. We wish to determine the price of a

European call option with maturity T , and strike price K.

It should be noted that the rather strange choice of m( 1 ) in the denominator

2

above is such that the discounted process is a martingale, since

¸ ¸

exp{σn+1 Zn+1 } exp{σn+1 Zn+1 }

|σn+1 }

E = E{E

m( 1 ) m( 1 )

2 2

2

exp{σn+1 /2}

}

= E{

m( 1 )

2

= 1.

There are many ways of simulating an option price in the above example, some

much more e¬cient than others. We might, for example, simulate all of the 2n

random variables {σi , Zi , i = 1, ..., n = T /∆t} and use these to determine the

simulated value of ST , ¬nally averaging the discounted payo¬ from the option

in this simulation, i.e. e’rT (ST ’K)+ . The price of this option at time 0 is the

average of many such simulations (say we do this a total of N times) discounted

to present,

e’rT (ST ’ K)+

where x denotes the average of the x0 s observed over all simulations. This is

206 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

a description of a crude and ine¬cient method of conducting this simulation.

Roughly the time required for the simulation is proportional to 2N n, the total

number of random variables generated. This chapter discusses some of the many

improvements possible in problems like this. Since each simulation requires at

least d = 2n independent uniform random variables to generate the values

{σi , Zi , i = 1, ..., n} then we are trying to estimate a rather complicated integral

of the form 4.1 of high dimension d. In this case, however, we can immediately

see some obvious improvements. Notice that we can rewrite ST in the form

exp{rT + σZ}

(4.2)

ST = S0

mn ( 1 )

2

Pn

where the random variable σ 2 = σi has moment generating function mn (s)

2

i=1

and Z is independent standard normal. Obviously, if we can simulate σ directly,

we can avoid the computation involved in generating the individual σi . Further

savings are possible in the light of the Black-Scholes formula which provides the

price of a call option when a stock price is given by (4.2) and the volatility

parameter σ is non-random. Since the expected return from the call under the

risk-neutral distribution can be written, using the Black-Scholes formula,

E(e’rT (ST ’ K)+ ) = E{E[e’rT (ST ’ K)+ |σ]}

σ2 σ2

log(S0 /K) + (r ’

log(S0 /K) + (r + 2 )T 2 )T

= e’rT E{S0 ¦( ) ’ Ke’rT ¦(

√ √ )}

σT σT

which is now a one-dimensional integral over the distribution of σ. This can now

be evaluated either by a one-dimensional numerical integration or by repeatedly

simulating the value of σ and averaging the values of

σ2 σ2

log(S0 /K) + (r ’

log(S0 /K) + (r + 2 )T 2 )T

’rT ’rT

√ √

) ’ Ke

e S0 ¦( ¦( )

σT σT

obtained from these simulations. As a special case we might take the distribution

2

of σi to be Gamma(±∆t, β) with moment generating function

1

m(s) =

(1 ’ βs)±∆t

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.207

in which case the distribution of σ 2 is Gamma(±T, β). This is the so-called

”variance-gamma” distribution investigated extensively by ....... and originally

suggested as a model for stock prices by ......Alternatively many other wider-

tailed alternatives to the normal returns model can be written as a variance

mixture of the normal distribution and option prices can be simulated in this

way. For example when the variance is generated having the distribution of the

reciprocal of a gamma random variable, the returns have a student™s t distribu-

tion. Similarly, the stable distributions and the Laplace distribution all have a

representation as a variance mixture of the normal.

The rest of this chapter discusses “variance reduction techniques” such as

the one employed above for evaluating integrals like (4.1), beginning with the

much simpler case of an integral in one dimension.

Variance reduction for one-dimensional Monte-

Carlo Integration.

R1

We wish to evaluate a one-dimensional integral f (u)du, which we will denote

0

by θ using by Monte-Carlo methods. We have seen before that whatever the

random variables that are input to our simulation program they are usually

generated using uniform[0,1] random variables U so without loss of generality

we can assume that the integral is with respect to the uniform[0,1] probability

density function, i.e. we wish to estimate

Z 1

θ = E{f (U )} = f (u)du.

0

One simple approach, called crude Monte Carlo is to randomly sample Ui ∼

U nif orm[0, 1] and then average the values of f (Ui ) obtain

n

1X

ˆ

θCR = f (Ui ).

n i=1

208 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

ˆ

It is easy to see that E(θCR ) = θ so that this average is an unbiased estimator

of the integral and the variance of the estimator is

ˆ

var(θCR ) = var(f (U1 ))/n.

Example 34 A crude simulation of a call option price under the Black-Scholes

model:

For a simple example that we will use throughout, consider an integral used to

price a call option. We saw in Section 3.8 that if a European option has payo¬

V (ST ) where ST is the value of the stock at maturity T , then the option can

be valued at present (t = 0) using the discounted future payo¬ from the option

under the risk neutral measure;

e’rT E[V (ST )] = e’rT E[V (S0 eX )]

where, in the Black-Scholes model, the random variable X = ln(ST /S0 ) has a

normal distribution with mean rT ’ σ 2 T /2 and variance σ 2 T . A normally

distributed random variable X can be generated by inverse transform and so we

σ2

can assume that X = ¦’1 (U ; rT ’ 2

is a function of a uniform[0, 1]

2 T, σ T )

σ2

random variable U where ¦’1 (U ; rT ’ 2

is the inverse of the normal

2 T, σ T )

(rT ’ σ 2 T /2, σ 2 T ) cumulative distribution function. Then the value of the

option can be written as an expectation over the distribution of the uniform

random variable U,

Z 1

E{f (U )} = f (u)du

0

σ2

where f (u) = e’rT V (S0 exp{¦’1 (U ; rT ’ T, σ 2 T )})

2

This function is graphed in Figure 4.1 in the case of a simple call option with

strike price K, with payo¬ at maturity V (ST ) = (ST ’ K)+ , the current stock

price S0 = $10, the exercise price K is $10, the annual interest rate r = 5%,

the maturity is three months or one quarter of year T = 0.25, and the annual

volatility σ = 0.20.

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.209

Figure 4.1: The function f (u) whose integral provides the value of a call option

A simple crude Monte Carlo estimator corresponds to evaluating this func-

tion at a large number of randomly selected values of Ui ∼ U [0, 1] and then

averaging the results. For example the following function in Matlab accepts a

vector of inputs u = (U1 , ..., Un ) assumed to be Uniform[0,1], outputs the values

Pn

ˆ 1

of f (U1 ), ...f (Un ) which can be averaged to give θCR = n i=1 f (Ui ).

function v=fn(u)

% value of the integrand for a call option with exercise price ex, r=annual interest

rate,

%sigma=annual vol, S0=current stock price.

% u=vector of uniform (0,1) inputs to

%generate normal variates by inverse transform. T=maturity

S0=10 ;K=10;r=.05; sigma=.2 ;T=.25 ; % Values of parameters

ST=S0*exp(norminv(u,r*T-sigma^2*T/2,sigma*sqrt(T)));

σ2

% ST =S0 exp{¦’1 (U ; rT ’ 2

2 T, σ T )} is stock price at time T

v=exp(-r*T)*max((ST-ex),0); % v is the discounted to present payo¬s from the

call option

and the analogous function in R,

fn<-function(u,So,strike,r,sigma,T){

210 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

# value of the integrand for a call option with exercise price=strike, r=annual

interest rate,

# sigma=annual volatility, So=current stock price, u=uniform (0,1) input to gen-

erate normal variates

# by inverse transform. T=time to maturity. For Black-Scholes price, integrate

over (0,1).

x<-So*exp(qnorm(u,mean=r*T-sigma^2*T/2,sd=sigma*sqrt(T)))

v<-exp(-r*T)*pmax((x-strike),0)

v}

In the case of initial stock price $10, exercise price=$10, annual vol=0.20, r =

5%, T = .25 (three months), this is run as

u=rand(1,500000); mean(fn(u))

and in R,

mean(fn(runif(500000),So=10,strike=10,r=.05,sigma=.2,T=.25))

ˆ

and this provides an approximate value of the option of θCR = 0.4620. The

standard error of this estimator, computed using the formula (??) below, is

√

around 8.7 — 10’7 . We may con¬rm with the black-scholes formula, again in

Matlab,

[CALL,PUT] = BLSPRICE(10,10,0.05,0.25,0.2,0).

The arguments are, in order (S0 .K, r, T, σ, q) where the last argument (here

q = 0) is the annual dividend yield which we assume here to be zero. Provided

that no dividends are paid on the stock before the maturity of the option, this

is reasonable. This Matlab command provides the result CALL = 0.4615 and

PUT = 0.3373 indicating that our simulated call option price was reasonably

accurate- out by 1 percent or so. The put option is an option to sell the stock

at the speci¬ed price $10 at the maturity date and is also priced by this same

function.

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.211

One of the advantages of Monte Carlo methods over numerical techniques is

that, because we are using a sample mean, we have a simple estimator of accu-

racy. In general, when n simulations are conducted, the accuracy is measured

by the standard error of the sample mean. Since

var(f (U1 ))

ˆ

var(θCR ) = ,

n

the standard error of the sample mean is the standard deviation or

σf

ˆ

SE(θCR ) = √ . (4.3)

n

2 2

where σf = var(f (U )). As usual we estimate σf using the sample standard de-

viation. Since fn(u) provides a whole vector of estimators (f (U1 ), f (U2 ), ..., f (Un ))

then sqrt(var(fn(u))) is the sample estimator of σf so the standard error

ˆ

SE(θCR ) is given by

Sf=sqrt(var(fn(u)));

Sf/sqrt(length(u))

√

giving an estimate 0.6603 of the standard deviation σf or standard error σf / 500000

or 0.0009. Of course parameters in statistical problems are usually estimated

using an interval estimate or a con¬dence interval, an interval constructed using

a method that guarantees capturing the true value of the parameter under sim-

ilar circumstances with high probability (the con¬dence coe¬cient, often taken

to be 95%). Formally,

De¬nition 35 A 95% con¬dence interval for a parameter θ is an interval [L, U ]

with random endpoints L, U such that the probability P [L · θ · U ] = 0.95.

If we were to repeat the experiment 100 times, say by running 100 more

similar independent simulations, and in each case use the results to construct

a 95% con¬dence interval, then this de¬nition implies that roughly 95% of the

intervals constructed will contain the true value of the parameter (and of course

212 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

2

roughly 5% will not). For an approximately Normal(µX , σX ) random variable

X, we can use the approximation

P [µX ’ 2σX · X · µX + 2σX ] ≈ 0.95 (4.4)

(i.e. approximately normal variables are within 2 standard deviations of their

mean with probability around 95%) to build a simple con¬dence interval. Strictly,

the value 2σX should be replaced by 1.96σX where 1.96 is taken from the Nor-

mal distribution tables. The value 2 is very close to correct for a t distribution

with 60 degrees of freedom. In any case these con¬dence intervals which as-

sume approximate normality are typically too short (i.e. contain the true value

of the parameter less frequently than stated) for most real data and so a value

marginally larger than 1.96 is warranted. Replacing σX above by the standard

deviation of a sample mean, (4.4) results in the approximately 95% con¬dence

interval

σf σf

ˆ ˆ

θCR ’ 2 √ · θ · θCR + 2 √

n n

for the true value θ. With con¬dence 95%, the true price of the option is

within the interval 0.462 ± 2(0.0009). As it happens in this case this interval

does capture the true value 0.4615 of the option.

So far Monte Carlo has not told us anything we couldn™t obtain from the

Black-Scholes formula, but what is we used a distribution other than the normal

to generate the returns? This is an easy modi¬cation of the above. For example

suppose we replace the standard normal by a logistic distribution which, as

we have seen, has a density function very similar to the standard normal if

we choose b = 0.625. Of course the Black-Scholes formula does not apply to a

process with logistically distributed returns. We need only replace the standard

normal inverse cumulative distribution function by the corresponding inverse

for the logistic,

µ ¶

U

F ’1 (U ) = b ln

1’U

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.213

and thus replace the Matlab code, “norminv(u,T*(r-sigma^2/2),sigma*sqrt(T))™™

by ˜˜T*(r-sigma^2/2)+sigma*sqrt(T)*.625*log(u./(1-u))™™. This results

in a slight increase in option value (to 0.504) and about a 50% considerable in-

crease in the variance of the estimator.

We will look at the e¬ciency of various improvements to crude Monte Carlo,

and to that end, we record the value of the variance of the estimator based on

a single uniform variate in this case;

2 2

σcrude = σf = var(f (U )) ≈ 0.436.

Then the crude Monte Carlo estimator using n function evaluations or n

uniform variates has variance approximately 0.436/n. If I were able to adjust

2

the method so that the variance σf based on a single evaluation of the func-

tion f in the numerator were halved, then I could achieve the same accuracy

from a simulation using half the number of function evaluations. For this rea-

son, when we compare two di¬erent methods for conducting a simulation, the

ratio of variances corresponding to a ¬xed number of function evaluations can

also be interpreted roughly as the ratio of computational e¬ort required for a

given predetermined accuracy. We will often compare various new methods of

estimating the same function based on variance reduction schemes and quote

the e¬ciency gain over crude Monte-Carlo sampling.

variance of Crude Monte Carlo Estimator

E¬ciency = (4.5)

Variance of new estimator

where both numerator a denominator correspond to estimators with the same

number of function evaluations (since this is usually the more expensive part

of the computation). An e¬ciency of 100 would indicate that the crude Monte

Carlo estimator would require 100 times the number of function evaluations to

achieve the same variance or standard error of estimator.

Consider a crude estimator obtained from ¬ve U [0, 1] variates,

Ui = 0.1, 0.3, 0.5, 0.6, 0.8, i = 1, ..., 5.

214 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Figure 4.2: Crude Monte Carlo Estimator based on 5 observations Ui =

0.1, 0.3, 0.5, 0.6, 0.8

The crude Monte Carlo estimator in the case n = 5 is displayed in Figure 3.1,

the estimator being the sum of the areas of the marked rectangles. Only three of

the ¬ve points actually contribute to this area since for this particular function

σ2

’rT ’1

T, σ 2 T )} ’ K)+

(u; rT ’ (4.6)

f (u) = e (S0 exp{¦

2

and the parameters chosen, f (0.1) = f (0.3) = 0. Since these two random num-

bers contributed 0 and the other three appear to be on average slightly too small,

the sum of the area of the rectangles appears to underestimate of the integral.

Of course another selection of ¬ve uniform random numbers may prove to be

even more badly distributed and may result in an under or an overestimate.

There are various ways of improving the e¬ciency of this estimator, many of

which partially emulate numerical integration techniques. First we should note

ˆ

that most numerical integrals, like θCR , are weighted averages of the values

of the function at certain points Ui . What if we evaluated the function at

non-random points, chosen to attempt reasonable balance between locations

where the function is large and small? Numerical integration techniques and

quadrature methods choose both points at which we evaluate the function and

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.215

Figure 4.3: Graphical illustration of the trapezoidal rule (4.8)

weights that we attach to these points to provide accurate approximations for

polynomials of certain degree. For example, suppose we insist on evaluating the

function at equally spaced points, for example the points 0, 1/n, 2/n, ..., (n ’

1)/n, 1. In some sense these points are now “more uniform” than we are likely

to obtain from n+1 randomly and independently chosen points Ui , i = 1, 2, ..., n.

The trapezoidal rule corresponds to using such equally spaced points and equal

weights (except at the boundary) so that the “estimator” of the integral is

1 1

ˆ {f (0) + 2f (1/n) + . . . + 2f (1 ’ ) + f (1)} (4.7)

θT R =

2n n

or the simpler and very similar alternative in our case, with n = 5,

1

ˆ

θT R = {f (0.1) + f (0.3) + f (0.5) + f (0.7) + f (0.9)} (4.8)

5

A reasonable balance between large and small values of the function is almost

guaranteed by such a rule, as shown in Figure 4.8 with the observations equally

spaced.

Simpson™s rule is to generate equally spaced points and weights that( except

for endpoints) alternate 2/3n, 4/3n, 2/3n.... In the case when n is even, the

216 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

integral is estimated with

n’1

1

ˆ {f (0) + 4f (1/n) + 2f (2/n) + . . . + 4f ( (4.9)

θSR = ) + f (1)}.

3n n

The trapezoidal rule is exact for linear functions and Simpson™s rule is exact for

quadratic functions.

These one-dimensional numerical integration rules provide some insight into

how to achieve lower variance in Monte Carlo integration. It illustrates some

options for increasing accuracy over simple random sampling. We may either

vary the weights attached to the individual points or vary the points (the Ui )

themselves or both. Notice that as long as the Ui individually have distributions

that are U nif orm[0, 1], we can introduce any degree of dependence among them

in order to come closer to the equal spacings characteristic of numerical integrals.

Even if the Ui are dependent U[0,1], an estimator of the form

n

1X

f (Ui )

n i=1

will continue to be an unbiased estimator because each of the summands con-

tinue to satisfy E(f (Ui )) = θ. Ideally if we introduce dependence among the

various Ui and the expected value remains unchanged , we would wish that the

variance

n

1X

var( f (Ui ))

n i=1

is reduced over independent uniform. The simplest case of this idea is the use

of antithetic random variables.

Antithetic Random Numbers.

Consider ¬rst the simple case of n = 2 function evaluations at possibly depen-

dent points. Then the estimator is

ˆ1

θ = {f (U1 ) + f (U2 )}

2

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.217

R1

with expected value θ = f (u)du and variance given by

0

1

ˆ {var(f (U1 )) + cov[f (U1 ), f (U2 )]}

var(θ) =

2

assuming both U1 , U2 are uniform[0,1]. In the independent case the covariance

term disappears and we obtain the variance of the crude Monte-Carlo estimator

1

var(f (U1 )).

2

Notice, however, that if we are able to introduce a negative covariance, the re-

ˆ

sulting variance of θ will be smaller than that of the corresponding crude Monte

Carlo estimator, so the question is how to generate this negative covariance.

Suppose for example that f is monotone (increasing or decreasing). Then

f (1 ’ U1 ) decreases whenever f (U1 ) increases, so that substituting U2 = 1 ’ U1

has the desired e¬ect and produces a negative covariance(in fact we will show

later that we cannot do any better when the function f is monotone). Such

a choice of U2 = 1 ’ U1 which helps reduce the variability in f (U1 ), is termed

an antithetic variate. In our example, because the function to be integrated is

monotone, there is a negative correlation between f (U1 ) and f (1 ’ U1 ) and

1 1

{var(f (U1 )) + cov[f (U1 ), f (U2 )]} < var(f (U1 )).

2 2

that is, the variance is decreased over simple random sampling. Of course in

practice our sample size is much greater than n = 2, but we still enjoy the

bene¬ts of this argument if we generate the points in antithetic pairs. For

example, to determine the extent of the variance reduction using antithetic

random numbers, suppose we generate 500, 000 uniform variates U and use as

well the values of 1 ’ U as (for a total of 1, 000, 000 function evaluations as

before).

F=(fn(u)+fn(1-u))/2;

218 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

This results in mean(F)=0.46186 and var(F)=0.1121. The standard error

of the estimator is

s

√

0.1121

= 2.24 — 107.

length(F )

Since each of the 500,000 components of F obtains from two function evalua-

tions, the variance should be compared with a crude Monte Carlo estimator with

the same number 1000000 function evaluations, σcrude /1000000 = 4.35 — 10’7 .

2

The e¬ciency gain due to the use of antithetic random numbers is 4.35/2.24 or

about two, so roughly half as many function evaluations using antithetic random

numbers provide the same precision as a crude Monte Carlo estimator. There

is the additional advantage that only half as many uniform random variables

are required. The introduction of antithetic variates has had the same e¬ect on

precision as increasing the sample size under crude Monte Carlo by a factor of

approximately 2.

We have noted that antithetic random numbers improved the e¬ciency

whenever the function being integrated is monotone in u. What if it is not.

For example suppose we use antithetic random numbers to integrate the func-

tion f (u) = u(1’u) on the interval 0 < u < 1? Rather than balance large values

with small values and so reduce the variance of the estimator, in this case notice

that f (U ) and f (1’U ) are strongly positively correlated, in fact are equal, and

so the argument supporting the use of antithetic random numbers for monotone

functions will show that in this case they increase the variance over a crude es-

timator with the same number of function evaluations. Of course this problem

can be remedied if we can identify intervals in which the function is monotone,

e.g. in this case use antithetic random numbers in the two intervals [0, 1 ] and

2

R1

[ 1 , 1], so for example we might estimate 0 f (u)du by an average of terms like

2

1 ’ U1 2 ’ U2

1 U1 1 + U2

{f ( ) + f ( ) + f( ) + f( )}

4 2 2 2 2

for independent U [0, 1] random variables U1 , U2 .

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.219

Strati¬ed Sample.

One of the reasons for the inaccuracy of the crude Monte Carlo estimator in the

above example is the large interval, evident in Figure 4.1, in which the function

is zero. Nevertheless, both crude and antithetic Monte Carlo methods sample

in that region, this portion of the sample contributing nothing to our integral.

Naturally, we would prefer to concentrate our sample in the region where the

function is positive, and where the function is more variable, use larger sample

sizes. One method designed to achieve this objective is the use of a strati¬ed

sample. Once again for a simple example we choose n = 2 function evaluations,

and with V1 ∼ U [0, a] and V2 ∼ U [a, 1] de¬ne an estimator

ˆ

θst = af (V1 ) + (1 ’ a)f (V2 ).

Note that this is a weighted average of the two function values with weights a

and 1 ’ a proportional to the length of the corresponding intervals. It is easy

ˆ

to show once again that the estimator θst is an unbiased estimator of θ, since

ˆ

E(θst ) = aEf (V1 ) + (1 ’ a)Ef (V2 )

Za Z1

1 1

f (x) dx + (1 ’ a)

=a f (x) dx

1’a

a

0 a

Z1

= f (x)dx.

0

Moreover,

var(θst ) = a2 var[f (V 1 )] + (1 ’ a)2 var[f (V 2 )] + 2a(1 ’ a)cov[f (V 1 ), f (V 2 )].

ˆ

(4.10)

ˆ

Even when V1 , V2 are independent, so we obtain var(θst ) = a2 var[f (V1 )] + (1 ’

a)2 var[f (V2 )], there may be a dramatic improvement in variance over crude

Monte Carlo provided that the variability of f in each of the intervals [0, a] and

[a, 1] is substantially less than in the whole interval [0, 1].

Let us return to the call option example above, with f de¬ned by (4.6).

220 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Suppose for simplicity we choose independent values of V1 , V2 . In this case

ˆ

var(θst ) = a2 var[f (V1 )] + (1 ’ a)2 var[f (V2 )]. (4.11)

For example for a = .7, this results in a variance of about 0.046 obtained

from the following

F=a*fn(a*rand(1,500000))+(1-a)*fn(a+(1-a)*rand(1,500000));

var(F)

and the variance of the sample mean of the components of the vector F is

var(F)/length(F) or around 9.2 — 10’8 . Since each component of the vector

above corresponds to two function evaluations we should compare this with a

crude Monte Carlo estimator with n = 1000000 having variance σf — 10’6 =

2

4.36 — 10’7 . This corresponds to an e¬ciency gain of .43.6/9.2 or around 5.

We can a¬ord to use one ¬fth the sample size by simply stratifying the sample

into two strata. The improvement is somewhat limited by the fact that we are

still sampling in a region in which the function is 0 (although now slightly less

often).

A general strati¬ed sample estimator is constructed as follows. We subdivide

the interval [0, 1] into convenient subintervals 0 = x0 < x1 < ...xk = 1, and

then select ni random variables uniform on the corresponding interval Vij ∼

U [xi’1 , xi ], j = 1, 2, ..., ni . Then the estimator of θ is

ni

k

X 1X

ˆ (xi ’ xi’1 ) (4.12)

θst = f (Vij ).

ni j=1

i=1

Once again the weights (xi ’ xi’1 ) on the average of the function in the i0 th

ˆ

interval are proportional to the lengths of these intervals and the estimator θst

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.221

is unbiased;

ni

k

X 1X

ˆ (xi ’ xi’1 )E{

E(θst ) = f (Vij )}

ni j=1

i=1

k

X

(xi ’ xi’1 )Ef (Vi1 )

=

i=1

Z

k

X xi

1

(xi ’ xi’1 )

= f (x) dx

xi ’ xi’1

xi’1

i=1

Z1

= f (x)dx = θ.

0

In the case that all of the Vij are independent, the variance is given by:

k

X 1

ˆ (xi ’ xi’1 )2 var[f (Vi1 )]. (4.13)

var(θst ) =

ni

i=1

Once again, if we choose our intervals so that the variation within intervals

var[f (Vi1 )] is small, this provides a substantial improvement over crude Monte

Carlo. Suppose we wish to choose the sample sizes so as to minimize this

variance. Obviously to avoid in¬nite sample sizes and to keep a ceiling on

costs, we need to impose a constraint on the total sample size, say

k

X

(4.14)

ni = n.

i

If we treat the parameters ni as continuous variables we can use the method of

Lagrange multipliers to solve

k

X 1

(xi ’ xi’1 )2 var[f (Vi1 )]

min

ni

{ni }

i=1

subject to constraint (4.14).

It is easy to show that the optimal choice of sample sizes within intervals are

p

ni ∝ (xi ’ xi’1 ) var[f (Vi1 )]

or more precisely that

p

(xi ’ xi’1 ) var[f (Vi1 )]

p (4.15)

ni = n Pk .

(xj ’ xj’1 ) var[f (Vj1 )]

j=1

222 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

In practice,of course, this will not necessarily produce an integral value of ni

and so we are forced to round to the nearest integer. For this optimal choice of

sample size, the variance is now given by

q

k

X

ˆst ) = 1 { (xj ’ xj’1 ) var[f (Vj1 )]}2

var(θ

n j=1

p

Pk

’ xj’1 ) var[f (Vj1 )] is a weighted average of the standard

The term j=1 (xj

deviation of the function f within the interval (xi’1 , xi ) and it is clear that,

at least for a continuous function, these standard deviations can be made small

simply by choosing k large with |xi ’xi’1 | small. In other words if we ignore the

fact that the sample sizes must be integers, at least for a continuous function f ,

ˆ

we can achieve arbitrarily small var(θst ) using a ¬xed sample size n simply by

stratifying into a very large number of (small) strata. The intervals should be

p

chosen so that the variances var[f (Vi1 )] are small. ni ∝ (xi ’xi’1 ) var[f (Vi1 )].

In summary, optimal sample sizes are proportional to the lengths of intervals

times the standard deviation of function evaluated at a uniform random variable

on the interval. For su¬ciently small strata we can achieve arbitrarily small

variances. The following function was designed to accept the strata x1 , x2 , ..., xk

and the desired sample size n as input, and then determine optimal sample sizes

and the strati¬ed sample estimator as follows:

1. Initially sample sizes of 1000 are chosen from each stratum and these are

p

used to estimate var[f (Vi1 )]

2. Approximately optimal sample sizes ni are then calculated from (4.15).

3. Samples of size ni are then taken and the strati¬ed sample estimator

(4.12), its variance ( 4.13) and the sample sizes ni are output.

function [est,v,n]=strati¬ed(x,nsample)

% function for optimal sample size strati¬ed estimator on call option price example

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.223

%[est,v,n]=strati¬ed([0 .6 .85 1],100000) uses three strata (0,.6),(.6 .85),(.85 1)

and total sample size 100000

est=0;

n=[];

m=length(x);

for i=1:m-1 % the preliminary sample of size 1000

v= var(callopt2(unifrnd(x(i),x(i+1),1,1000),10,10,.05,.2,.25));

n=[n (x(i+1)-x(i))*sqrt(v)];

end

n=¬‚oor(nsample*n/sum(n)); %calculation of the optimal sample sizes, rounded

down

v=0;

for i=1:m-1

F=callopt2(unifrnd(x(i),x(i+1),1,n(i)),10,10,.05,.2,.25); %evaluate the function

f at n(i) uniform points in interval

est=est+(x(i+1)-x(i))*mean(F);

v=v+var(F)*(x(i+1)-x(i))^2/n(i);

end

A call to [est,v,n]=strati¬ed([0 .6 .85 1],100000) for example generates a

strati¬ed sample with three strata[0, 0.6], (0.6, 0.85], and (0.8, 1] and outputs

the estimate est = 0.4617, its variance v = 3.5 — 10’7 and the approximately

optimal choice of sample sizes n = 26855, 31358, 41785. To compare this with

a crude Monte Carlo estimator, note that a total of 99998 function evaluations

are used so the e¬ciency gain is σf /(99998 — 3.5 — 10’7 ) = 12.8. Evidently this

2

strati¬ed random sample can account for an improvement in e¬ciency of about

a factor of 13. Of course there is a little setup cost here (a preliminary sample

of size 3000) which we have not included in our calculation but the results of

that preliminary sample could have been combined with the main sample for a

very slight decrease in variance as well). For comparison, the function call

224 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

[est,v,n]=strati¬ed([.47 .62 .75 .87 .96 1],1000000)

uses ¬ve strata [.47 .62],[.62 .75], [.75, .87], [.87, .96], [.96, 1] and gives a

variance of the estimator of 7.4— 10’9 . Since a crude sample of the same size has

variance around 4.36 — 10’7 the e¬ciency is about 170. This strati¬ed sample

is as good as a crude Monte Carlo estimator with 170 million simulations! By

introducing more strata, we can increase this e¬ciency as much as we wish.

Within a strati¬ed random sample we may also introduce antithetic variates

designed to provide negative covariance. For example we may use antithetic

pairs within an interval if we believe that the function is monotone in the inter-

val, or if we believe that the function is increasing across adjacent strata, we can

introduce antithetic pairs between two intervals. For example, we may generate

U ∼ U nif orm[0, 1] and then sample the point Vij = xi’1 + (xi ’ xi’1 )U from

the interval (xi’1 , xi ) as well as the point V(i+1)j = xi+1 ’ (xi+1 ’ xi )U from

the interval (xi , xi+1 ) to obtain antithetic pairs between intervals. For a simple

example of this applied to the above call option valuation, consider the estima-

tor based on three strata [0,.47),[0.47 0.84),[0.84 1]. Here we have not bothered

to sample to the left of 0.47 since the function is 0 there, so the sample size here

is set to 0. Then using antithetic random numbers within each of the two strata

[0.47 0.84),[0.84 1], and U ∼ U nif orm[0, 1] we obtain the estimator

0.37 0.16

ˆ [f (.84+.16U )+ f (1 ’ .16U )]

θstr,ant = [f (.47+ .37U ) +f (.84’ .37U )] +

2 2

To assess this estimator,

we evaluated, for U a vector of 1000000 uniform,

U=rand(1,1000000);

F=.37*.5*(fn(.47+.37*U)+fn(.84-.37*U))+.16*.5*(fn(.84+.16*U)+fn(1-.16*U));

mean(F) % gives 0.4615

% gives 1.46— 10’9

var(F)/length(F)

VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.225

This should be compared with the crude Monte-Carlo estimator having the same

number n = 4— 106 function evaluations as each of the components of the vector

F : σcrude /(4— 106 ) = 1.117— 10’7 . The gain in e¬ciency is therefore 1.117/.0146

2

or approximately 77. The above strati¬ed-antithetic simulation with 1,000,000

input variates and 4,000,000 function evaluations is equivalent to a crude Monte

Carlo simulation with sample size 308 million! Variance reduction makes the

di¬erence between a simulation that is feasible on a laptop and one that would

require a very long time on a mainframe computer. However on a Pentium IV

2.2GHZ laptop it took approximately 58 seconds to run.

Control Variates.

There are two techniques that permit using knowledge about a function with

shape similar to that of f . First, we consider the use of a control variate, based

on the trivial identity

Z Z Z

f (u)du = g(u)du + (f (u) ’ g(u))du. (4.16)

for an arbitrary function g(u). Assume that the integral of g is known, so we

can substitute its known value for the ¬rst term above. The second integral we

assume is more di¬cult and we estimate it by crude Monte Carlo, resulting in

estimator

Z n

1X

ˆ [f (Ui ) ’ g(Ui )]. (4.17)

θcv = g(u)du +

n i=1

This estimator is clearly unbiased and has variance

n

1X

ˆ [f (Ui ) ’ g(Ui )]}

var(θcv ) = var{

n i=1

var[f (U ) ’ g(U )]

=

n

so the variance is reduced over that of crude Monte Carlo estimator having the

same sample size n by a factor

var[f (U )]

for U ∼ U [0, 1]. (4.18)

var[f (U ) ’ g(U )]

226 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES