стр. 1(всего 2)СОДЕРЖАНИЕ >>
Chapter 4

Variance Reduction
Techniques

Introduction.

In this chapter we discuss techniques for improving on the speed and eп¬ѓciency
of a simulation, usually called вЂњvariance reduction techniquesвЂќ.

Much of the simulation literature concerns discrete event simulations (DES),
simulations of systems that are assumed to change instantaneously in response
to sudden or discrete events. These are the most common in operations research
and examples are simulations of processes such as networks or queues. Simula-
tion models in which the process is characterized by a state, with changes only
at discrete time points are DES. In modeling an inventory system, for example,
the arrival of a batch of raw materials can be considered as an event which pre-
cipitates a sudden change in the state of the system, followed by a demand some
discrete time later when the state of the system changes again. A system driven
by diп¬Ђerential equations in continuous time is an example of a DES because
the changes occur continuously in time. One approach to DES is future event

211
212 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

simulation which schedules one or more future events at a time, choosing the
event in the future event set which has minimum time, updating the state of
the system and the clock accordingly, and then repeating this whole procedure.
A stock price which moves by discrete amounts may be considered a DES. In
fact this approach is often used in valuing American options by Monte Carlo
methods with binomial or trinomial trees.
Often we identify one or more performance measures by which the system
is to be judged, and parameters which may be adjusted to improve the system
performance. Examples are the delay for an air traп¬ѓc control system, customer
waiting times for a bank teller scheduling system, delays or throughput for
computer networks, response times for the location of п¬Ѓre stations or supply
depots, etc. Performance measures again are important in engineering examples
or in operations research, but less common in п¬Ѓnance. They may be used to
calibrate a simulation model, however. For example our performance measure
might be the average distance between observed option prices on a given stock
and prices obtained by simulation from given model parameters. In all cases,
the performance measure is usually the expected value of a complicated function
of many variables, often expressible only by a computer program with some
simulated random variables as input. Whether these input random variables are
generated by inverse transform, or acceptance-rejection or some other method,
they are ultimately a function of uniform[0,1] random variables U1 , U2 , .... These
uniform random variables determine such quantities as the normally distributed
increments of the logarithm of the stock price. In summary, the simulation is
used simply to estimate a multidimensional integral of the form
ZZ Z
(4.1)
E(g(U1 , ..., Ud )) = .. g(u1 , u2 , ...ud )du1 du2 . . . dud

over the unit cube in d dimensions where often d is large.
As an example in п¬Ѓnance, suppose that we wish to price a European option
on a stock price under the following stochastic volatility model.
INTRODUCTION. 213

Example 33 Suppose the daily asset returns under a risk-neutral distribution
is assumed to be a variance mixture of the Normal distribution, by which we
mean that the variance itself is random, independent of the normal variable and
follows a distribution with moment generating function s(s). More speciп¬Ѓcally
assume under the Q measure that the stock price at time nв€†t is determined
from
exp{rв€†t + Пѓn+1 Zn+1 }
S(n+1)в€†t = Snв€†t
m( 1 )
2
2
where, under the risk-neutral distribution, the positive random variables Пѓi
are assumed to have a distribution with moment generating function m(s) =
2 2
E{exp(sПѓi )}, Zi is standard normal independent of Пѓi and both (Zi , Пѓi ) are
independent of the process up to time nв€†t. We wish to determine the price of a
European call option with maturity T , and strike price K.

It should be noted that the rather strange choice of m( 1 ) in the denominator
2

above is such that the discounted process is a martingale, since
Вё Вё
exp{Пѓn+1 Zn+1 } exp{Пѓn+1 Zn+1 }
|Пѓn+1 }
E = E{E
m( 1 ) m( 1 )
2 2
2
exp{Пѓn+1 /2}
}
= E{
m( 1 )
2

= 1.

There are many ways of simulating an option price in the above example, some
much more eп¬ѓcient than others. We might, for example, simulate all of the 2n
random variables {Пѓi , Zi , i = 1, ..., n = T /в€†t} and use these to determine the
simulated value of ST , п¬Ѓnally averaging the discounted payoп¬Ђ from the option
in this simulation, i.e. eв€’rT (ST в€’K)+ . The price of this option at time 0 is the
average of many such simulations (say we do this a total of N times) discounted
to present,
eв€’rT (ST в€’ K)+

where x denotes the average of the x0 s observed over all simulations. This is
214 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

a description of a crude and ineп¬ѓcient method of conducting this simulation.
Roughly the time required for the simulation is proportional to 2N n, the total
number of random variables generated. This chapter discusses some of the many
improvements possible in problems like this. Since each simulation requires at
least d = 2n independent uniform random variables to generate the values
{Пѓi , Zi , i = 1, ..., n} then we are trying to estimate a rather complicated integral
of the form 4.1 of high dimension d. In this case, however, we can immediately
see some obvious improvements. Notice that we can rewrite ST in the form

exp{rT + ПѓZ}
(4.2)
ST = S0
mn ( 1 )
2
Pn
where the random variable Пѓ 2 = Пѓi has moment generating function mn (s)
2
i=1

and Z is independent standard normal. Obviously, if we can simulate Пѓ directly,
we can avoid the computation involved in generating the individual Пѓi . Further
savings are possible in the light of the Black-Scholes formula which provides the
price of a call option when a stock price is given by (4.2) and the volatility
parameter Пѓ is non-random. Since the expected return from the call under the
risk-neutral distribution can be written, using the Black-Scholes formula,

E(eв€’rT (ST в€’ K)+ ) = E{E[eв€’rT (ST в€’ K)+ |Пѓ]}
Пѓ2 Пѓ2
log(S0 /K) + (r в€’
log(S0 /K) + (r + 2 )T 2 )T
= eв€’rT E{S0 О¦( ) в€’ Keв€’rT О¦(
в€љ в€љ )}
ПѓT ПѓT

which is now a one-dimensional integral over the distribution of Пѓ. This can now
be evaluated either by a one-dimensional numerical integration or by repeatedly
simulating the value of Пѓ and averaging the values of
Пѓ2 Пѓ2
log(S0 /K) + (r в€’
log(S0 /K) + (r + 2 )T 2 )T
в€’rT в€’rT
в€љ в€љ
) в€’ Ke
e S0 О¦( О¦( )
ПѓT ПѓT

obtained from these simulations. As a special case we might take the distribution
2
of Пѓi to be Gamma(О±в€†t, ОІ) with moment generating function

1
m(s) =
(1 в€’ ОІs)О±в€†t
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.215

in which case the distribution of Пѓ 2 is Gamma(О±T, ОІ). This is the so-called
вЂќvariance-gammaвЂќ distribution investigated extensively by ....... and originally
suggested as a model for stock prices by ......Alternatively many other wider-
tailed alternatives to the normal returns model can be written as a variance
mixture of the normal distribution and option prices can be simulated in this
way. For example when the variance is generated having the distribution of the
reciprocal of a gamma random variable, the returns have a studentвЂ™s t distribu-
tion. Similarly, the stable distributions and the Laplace distribution all have a
representation as a variance mixture of the normal.

The rest of this chapter discusses вЂњvariance reduction techniquesвЂќ such as
the one employed above for evaluating integrals like (4.1), beginning with the
much simpler case of an integral in one dimension.

Variance reduction for one-dimensional Monte-
Carlo Integration.
R1
We wish to evaluate a one-dimensional integral f (u)du, which we will denote
0

by Оё using by Monte-Carlo methods. We have seen before that whatever the
random variables that are input to our simulation program they are usually
generated using uniform[0,1] random variables U so without loss of generality
we can assume that the integral is with respect to the uniform[0,1] probability
density function, i.e. we wish to estimate
Z 1
Оё = E{f (U )} = f (u)du.
0

One simple approach, called crude Monte Carlo is to randomly sample Ui в€ј
U nif orm[0, 1] and then average the values of f (Ui ) obtain
n
1X
Л†
ОёCR = f (Ui ).
n i=1
216 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Л†
It is easy to see that E(ОёCR ) = Оё so that this average is an unbiased estimator
of the integral and the variance of the estimator is

Л†
var(ОёCR ) = var(f (U1 ))/n.

Example 34 A crude simulation of a call option price under the Black-Scholes
model:
For a simple example that we will use throughout, consider an integral used to
price a call option. We saw in Section 3.8 that if a European option has payoп¬Ђ
V (ST ) where ST is the value of the stock at maturity T , then the option can
be valued at present (t = 0) using the discounted future payoп¬Ђ from the option
under the risk neutral measure;

eв€’rT E[V (ST )] = eв€’rT E[V (S0 eX )]

where, in the Black-Scholes model, the random variable X = ln(ST /S0 ) has a
normal distribution with mean rT в€’ Пѓ 2 T /2 and variance Пѓ 2 T . A normally
distributed random variable X can be generated by inverse transform and so we
Пѓ2
can assume that X = О¦в€’1 (U ; rT в€’ 2
is a function of a uniform[0, 1]
2 T, Пѓ T )
Пѓ2
random variable U where О¦в€’1 (U ; rT в€’ 2
is the inverse of the normal
2 T, Пѓ T )

(rT в€’ Пѓ 2 T /2, Пѓ 2 T ) cumulative distribution function. Then the value of the
option can be written as an expectation over the distribution of the uniform
random variable U,
Z 1
E{f (U )} = f (u)du
0

Пѓ2
where f (u) = eв€’rT V (S0 exp{О¦в€’1 (U ; rT в€’ T, Пѓ 2 T )})
2
This function is graphed in Figure 4.1 in the case of a simple call option with
strike price K, with payoп¬Ђ at maturity V (ST ) = (ST в€’ K)+ , the current stock
price S0 = \$10, the exercise price K is \$10, the annual interest rate r = 5%,
the maturity is three months or one quarter of year T = 0.25, and the annual
volatility Пѓ = 0.20.
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.217

Figure 4.1: The function f (u) whose integral provides the value of a call option

A simple crude Monte Carlo estimator corresponds to evaluating this func-
tion at a large number of randomly selected values of Ui в€ј U [0, 1] and then
averaging the results. For example the following function in Matlab accepts a
vector of inputs u = (U1 , ..., Un ) assumed to be Uniform[0,1], outputs the values
Pn
Л† 1
of f (U1 ), ...f (Un ) which can be averaged to give ОёCR = n i=1 f (Ui ).

function v=fn(u)

% value of the integrand for a call option with exercise price ex, r=annual interest

rate,

%sigma=annual vol, S0=current stock price.

% u=vector of uniform (0,1) inputs to

%generate normal variates by inverse transform. T=maturity

S0=10 ;K=10;r=.05; sigma=.2 ;T=.25 ; % Values of parameters

ST=S0*exp(norminv(u,r*T-sigma^2*T/2,sigma*sqrt(T)));
Пѓ2
% ST =S0 exp{О¦в€’1 (U ; rT в€’ 2
2 T, Пѓ T )} is stock price at time T

v=exp(-r*T)*max((ST-ex),0); % v is the discounted to present payoп¬Ђs from the

call option

and the analogous function in R,

fn<-function(u,So,strike,r,sigma,T){
218 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

# value of the integrand for a call option with exercise price=strike, r=annual

interest rate,

# sigma=annual volatility, So=current stock price, u=uniform (0,1) input to gen-

erate normal variates

# by inverse transform. T=time to maturity. For Black-Scholes price, integrate

over (0,1).

x<-So*exp(qnorm(u,mean=r*T-sigma^2*T/2,sd=sigma*sqrt(T)))

v<-exp(-r*T)*pmax((x-strike),0)

v}

In the case of initial stock price \$10, exercise price=\$10, annual vol=0.20, r =
5%, T = .25 (three months), this is run as

u=rand(1,500000); mean(fn(u)) and in R, mean(fn(runif(500000),So=10,strike=10,r=.05,sigma=.2,T=

Л†
and this provides an approximate value of the option of ОёCR = 0.4620. The stan-
dard error of this estimator, computed using the formula (??) below, is around
в€љ
8.7 Г— 10в€’7 . We may conп¬Ѓrm with the black-scholes formula, again in Matlab,

[CALL,PUT] = BLSPRICE(10,10,0.05,0.25,0.2,0).

The arguments are, in order (S0 .K, r, T, Пѓ, q) where the last argument (here
q = 0) is the annual dividend yield which we assume here to be zero. Provided
that no dividends are paid on the stock before the maturity of the option, this
is reasonable. This Matlab command provides the result CALL = 0.4615 and
PUT = 0.3373 indicating that our simulated call option price was reasonably
accurate- out by 1 percent or so. The put option is an option to sell the stock
at the speciп¬Ѓed price \$10 at the maturity date and is also priced by this same
function.
One of the advantages of Monte Carlo methods over numerical techniques is
that, because we are using a sample mean, we have a simple estimator of accu-
racy. In general, when n simulations are conducted, the accuracy is measured
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.219

by the standard error of the sample mean. Since

var(f (U1 ))
Л†
var(ОёCR ) = ,
n

the standard error of the sample mean is the standard deviation or

Пѓf
Л†
SE(ОёCR ) = в€љ . (4.3)
n
2 2
where Пѓf = var(f (U )). As usual we estimate Пѓf using the sample standard de-
viation. Since fn(u) provides a whole vector of estimators (f (U1 ), f (U2 ), ..., f (Un ))
then sqrt(var(fn(u))) is the sample estimator of Пѓf so the standard error
Л†
SE(ОёCR ) is given by

Sf=sqrt(var(fn(u)));

Sf/sqrt(length(u))
в€љ
giving an estimate 0.6603 of the standard deviation Пѓf or standard error Пѓf / 500000
or 0.0009. Of course parameters in statistical problems are usually estimated
using an interval estimate or a conп¬Ѓdence interval, an interval constructed using
a method that guarantees capturing the true value of the parameter under sim-
ilar circumstances with high probability (the conп¬Ѓdence coeп¬ѓcient, often taken
to be 95%). Formally,

Deп¬Ѓnition 35 A 95% conп¬Ѓdence interval for a parameter Оё is an interval [L, U ]
with random endpoints L, U such that the probability P [L В· Оё В· U ] = 0.95.

If we were to repeat the experiment 100 times, say by running 100 more
similar independent simulations, and in each case use the results to construct
a 95% conп¬Ѓdence interval, then this deп¬Ѓnition implies that roughly 95% of the
intervals constructed will contain the true value of the parameter (and of course
2
roughly 5% will not). For an approximately Normal(ВµX , ПѓX ) random variable
X, we can use the approximation

P [ВµX в€’ 2ПѓX В· X В· ВµX + 2ПѓX ] в‰€ 0.95 (4.4)
220 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

(i.e. approximately normal variables are within 2 standard deviations of their
mean with probability around 95%) to build a simple conп¬Ѓdence interval. Strictly,
the value 2ПѓX should be replaced by 1.96ПѓX where 1.96 is taken from the Nor-
mal distribution tables. The value 2 is very close to correct for a t distribution
with 60 degrees of freedom. In any case these conп¬Ѓdence intervals which as-
sume approximate normality are typically too short (i.e. contain the true value
of the parameter less frequently than stated) for most real data and so a value
marginally larger than 1.96 is warranted. Replacing ПѓX above by the standard
deviation of a sample mean, (4.4) results in the approximately 95% conп¬Ѓdence
interval
Пѓf Пѓf
Л† Л†
ОёCR в€’ 2 в€љ В· Оё В· ОёCR + 2 в€љ
n n

for the true value Оё. With conп¬Ѓdence 95%, the true price of the option is
within the interval 0.462 В± 2(0.0009). As it happens in this case this interval
does capture the true value 0.4615 of the option.
So far Monte Carlo has not told us anything we couldnвЂ™t obtain from the
Black-Scholes formula, but what is we used a distribution other than the normal
to generate the returns? This is an easy modiп¬Ѓcation of the above. For example
suppose we replace the standard normal by a logistic distribution which, as
we have seen, has a density function very similar to the standard normal if
we choose b = 0.625. Of course the Black-Scholes formula does not apply to a
process with logistically distributed returns. We need only replace the standard
normal inverse cumulative distribution function by the corresponding inverse
for the logistic,
Вµ В¶
U
F в€’1 (U ) = b ln
1в€’U

and thus replace the Matlab code, вЂњnorminv(u,T*(r-sigma^2/2),sigma*sqrt(T))вЂ™вЂ™
by вЂ˜вЂ˜T*(r-sigma^2/2)+sigma*sqrt(T)*.625*log(u./(1-u))вЂ™вЂ™. This results
in a slight increase in option value (to 0.504) and about a 50% considerable in-
crease in the variance of the estimator.
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.221

We will look at the eп¬ѓciency of various improvements to crude Monte Carlo,
and to that end, we record the value of the variance of the estimator based on
a single uniform variate in this case;

2 2
Пѓcrude = Пѓf = var(f (U )) в‰€ 0.436.

Then the crude Monte Carlo estimator using n function evaluations or n
uniform variates has variance approximately 0.436/n. If I were able to adjust
2
the method so that the variance Пѓf based on a single evaluation of the func-
tion f in the numerator were halved, then I could achieve the same accuracy
from a simulation using half the number of function evaluations. For this rea-
son, when we compare two diп¬Ђerent methods for conducting a simulation, the
ratio of variances corresponding to a п¬Ѓxed number of function evaluations can
also be interpreted roughly as the ratio of computational eп¬Ђort required for a
given predetermined accuracy. We will often compare various new methods of
estimating the same function based on variance reduction schemes and quote
the eп¬ѓciency gain over crude Monte-Carlo sampling.

variance of Crude Monte Carlo Estimator
Eп¬ѓciency = (4.5)
Variance of new estimator

where both numerator a denominator correspond to estimators with the same
number of function evaluations (since this is usually the more expensive part
of the computation). An eп¬ѓciency of 100 would indicate that the crude Monte
Carlo estimator would require 100 times the number of function evaluations to
achieve the same variance or standard error of estimator.
Consider a crude estimator obtained from п¬Ѓve U [0, 1] variates,

Ui = 0.1, 0.3, 0.5, 0.6, 0.8, i = 1, ..., 5.

The crude Monte Carlo estimator in the case n = 5 is displayed in Figure 3.1,
the estimator being the sum of the areas of the marked rectangles. Only three of
222 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Figure 4.2: Crude Monte Carlo Estimator based on 5 observations Ui =
0.1, 0.3, 0.5, 0.6, 0.8

the п¬Ѓve points actually contribute to this area since for this particular function

Пѓ2
в€’rT в€’1
T, Пѓ 2 T )} в€’ K)+
(u; rT в€’ (4.6)
f (u) = e (S0 exp{О¦
2

and the parameters chosen, f (0.1) = f (0.3) = 0. Since these two random num-
bers contributed 0 and the other three appear to be on average slightly too small,
the sum of the area of the rectangles appears to underestimate of the integral.
Of course another selection of п¬Ѓve uniform random numbers may prove to be
even more badly distributed and may result in an under or an overestimate.
There are various ways of improving the eп¬ѓciency of this estimator, many of
which partially emulate numerical integration techniques. First we should note
Л†
that most numerical integrals, like ОёCR , are weighted averages of the values
of the function at certain points Ui . What if we evaluated the function at
non-random points, chosen to attempt reasonable balance between locations
where the function is large and small? Numerical integration techniques and
quadrature methods choose both points at which we evaluate the function and
weights that we attach to these points to provide accurate approximations for
polynomials of certain degree. For example, suppose we insist on evaluating the
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.223

Figure 4.3: Graphical illustration of the trapezoidal rule (4.8)

function at equally spaced points, for example the points 0, 1/n, 2/n, ..., (n в€’
1)/n, 1. In some sense these points are now вЂњmore uniformвЂќ than we are likely
to obtain from n+1 randomly and independently chosen points Ui , i = 1, 2, ..., n.
The trapezoidal rule corresponds to using such equally spaced points and equal
weights (except at the boundary) so that the вЂњestimatorвЂќ of the integral is

1 1
Л† {f (0) + 2f (1/n) + . . . + 2f (1 в€’ ) + f (1)} (4.7)
ОёT R =
2n n

or the simpler and very similar alternative in our case, with n = 5,

1
Л†
ОёT R = {f (0.1) + f (0.3) + f (0.5) + f (0.7) + f (0.9)} (4.8)
5

A reasonable balance between large and small values of the function is almost
guaranteed by such a rule, as shown in Figure 4.8 with the observations equally
spaced.
SimpsonвЂ™s rule is to generate equally spaced points and weights that( except
for endpoints) alternate 2/3n, 4/3n, 2/3n.... In the case when n is even, the
integral is estimated with

nв€’1
1
Л† {f (0) + 4f (1/n) + 2f (2/n) + . . . + 4f ( (4.9)
ОёSR = ) + f (1)}.
3n n
224 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

The trapezoidal rule is exact for linear functions and SimpsonвЂ™s rule is exact for
These one-dimensional numerical integration rules provide some insight into
how to achieve lower variance in Monte Carlo integration. It illustrates some
options for increasing accuracy over simple random sampling. We may either
vary the weights attached to the individual points or vary the points (the Ui )
themselves or both. Notice that as long as the Ui individually have distributions
that are U nif orm[0, 1], we can introduce any degree of dependence among them
in order to come closer to the equal spacings characteristic of numerical integrals.
Even if the Ui are dependent U[0,1], an estimator of the form
n
1X
f (Ui )
n i=1

will continue to be an unbiased estimator because each of the summands con-
tinue to satisfy E(f (Ui )) = Оё. Ideally if we introduce dependence among the
various Ui and the expected value remains unchanged , we would wish that the
variance
n
1X
var( f (Ui ))
n i=1

is reduced over independent uniform. The simplest case of this idea is the use
of antithetic random variables.

Antithetic Random Numbers.

Consider п¬Ѓrst the simple case of n = 2 function evaluations at possibly depen-
dent points. Then the estimator is

Л†1
Оё = {f (U1 ) + f (U2 )}
2
R1
with expected value Оё = f (u)du and variance given by
0

1
Л† {var(f (U1 )) + cov[f (U1 ), f (U2 )]}
var(Оё) =
2
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.225

assuming both U1 , U2 are uniform[0,1]. In the independent case the covariance
term disappears and we obtain the variance of the crude Monte-Carlo estimator

1
var(f (U1 )).
2

Notice, however, that if we are able to introduce a negative covariance, the re-
Л†
sulting variance of Оё will be smaller than that of the corresponding crude Monte
Carlo estimator, so the question is how to generate this negative covariance.
Suppose for example that f is monotone (increasing or decreasing). Then
f (1 в€’ U1 ) decreases whenever f (U1 ) increases, so that substituting U2 = 1 в€’ U1
has the desired eп¬Ђect and produces a negative covariance(in fact we will show
later that we cannot do any better when the function f is monotone). Such
a choice of U2 = 1 в€’ U1 which helps reduce the variability in f (U1 ), is termed
an antithetic variate. In our example, because the function to be integrated is
monotone, there is a negative correlation between f (U1 ) and f (1 в€’ U1 ) and

1 1
{var(f (U1 )) + cov[f (U1 ), f (U2 )]} < var(f (U1 )).
2 2

that is, the variance is decreased over simple random sampling. Of course in
practice our sample size is much greater than n = 2, but we still enjoy the
beneп¬Ѓts of this argument if we generate the points in antithetic pairs. For
example, to determine the extent of the variance reduction using antithetic
random numbers, suppose we generate 500, 000 uniform variates U and use as
well the values of 1 в€’ U as (for a total of 1, 000, 000 function evaluations as
before).

F=(fn(u)+fn(1-u))/2;

This results in mean(F)=0.46186 and var(F)=0.1121. The standard error
of the estimator is s
в€љ
0.1121
= 2.24 Г— 107.
length(F )
226 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Since each of the 500,000 components of F obtains from two function evalua-
tions, the variance should be compared with a crude Monte Carlo estimator with
the same number 1000000 function evaluations, Пѓcrude /1000000 = 4.35 Г— 10в€’7 .
2

The eп¬ѓciency gain due to the use of antithetic random numbers is 4.35/2.24 or
about two, so roughly half as many function evaluations using antithetic random
numbers provide the same precision as a crude Monte Carlo estimator. There
is the additional advantage that only half as many uniform random variables
are required. The introduction of antithetic variates has had the same eп¬Ђect on
precision as increasing the sample size under crude Monte Carlo by a factor of
approximately 2.
We have noted that antithetic random numbers improved the eп¬ѓciency
whenever the function being integrated is monotone in u. What if it is not.
For example suppose we use antithetic random numbers to integrate the func-
tion f (u) = u(1в€’u) on the interval 0 < u < 1? Rather than balance large values
with small values and so reduce the variance of the estimator, in this case notice
that f (U ) and f (1в€’U ) are strongly positively correlated, in fact are equal, and
so the argument supporting the use of antithetic random numbers for monotone
functions will show that in this case they increase the variance over a crude es-
timator with the same number of function evaluations. Of course this problem
can be remedied if we can identify intervals in which the function is monotone,
e.g. in this case use antithetic random numbers in the two intervals [0, 1 ] and
2
R1
[ 1 , 1], so for example we might estimate 0 f (u)du by an average of terms like
2

1 в€’ U1 2 в€’ U2
1 U1 1 + U2
{f ( ) + f ( ) + f( ) + f( )}
4 2 2 2 2

for independent U [0, 1] random variables U1 , U2 .

Stratiп¬Ѓed Sample.

One of the reasons for the inaccuracy of the crude Monte Carlo estimator in the
above example is the large interval, evident in Figure 4.1, in which the function
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.227

is zero. Nevertheless, both crude and antithetic Monte Carlo methods sample
in that region, this portion of the sample contributing nothing to our integral.
Naturally, we would prefer to concentrate our sample in the region where the
function is positive, and where the function is more variable, use larger sample
sizes. One method designed to achieve this objective is the use of a stratiп¬Ѓed
sample. Once again for a simple example we choose n = 2 function evaluations,
and with V1 в€ј U [0, a] and V2 в€ј U [a, 1] deп¬Ѓne an estimator

Л†
Оёst = af (V1 ) + (1 в€’ a)f (V2 ).

Note that this is a weighted average of the two function values with weights a
and 1 в€’ a proportional to the length of the corresponding intervals. It is easy
Л†
to show once again that the estimator Оёst is an unbiased estimator of Оё, since

Л†
E(Оёst ) = aEf (V1 ) + (1 в€’ a)Ef (V2 )
Za Z1
1 1
f (x) dx + (1 в€’ a)
=a f (x) dx
1в€’a
a
0 a
Z1
= f (x)dx.
0

Moreover,

var(Оёst ) = a2 var[f (V 1 )] + (1 в€’ a)2 var[f (V 2 )] + 2a(1 в€’ a)cov[f (V 1 ), f (V 2 )].
Л†

(4.10)
Л†
Even when V1 , V2 are independent, so we obtain var(Оёst ) = a2 var[f (V1 )] + (1 в€’
a)2 var[f (V2 )], there may be a dramatic improvement in variance over crude
Monte Carlo provided that the variability of f in each of the intervals [0, a] and
[a, 1] is substantially less than in the whole interval [0, 1].
Let us return to the call option example above, with f deп¬Ѓned by (4.6).
Suppose for simplicity we choose independent values of V1 , V2 . In this case

Л†
var(Оёst ) = a2 var[f (V1 )] + (1 в€’ a)2 var[f (V2 )]. (4.11)

For example for a = .7, this results in a variance of about 0.046 obtained
from the following
228 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

F=a*fn(a*rand(1,500000))+(1-a)*fn(a+(1-a)*rand(1,500000));

var(F)

and the variance of the sample mean of the components of the vector F is
var(F)/length(F) or around 9.2 Г— 10в€’8 . Since each component of the vector
above corresponds to two function evaluations we should compare this with a
crude Monte Carlo estimator with n = 1000000 having variance Пѓf Г— 10в€’6 =
2

4.36 Г— 10в€’7 . This corresponds to an eп¬ѓciency gain of .43.6/9.2 or around 5.
We can aп¬Ђord to use one п¬Ѓfth the sample size by simply stratifying the sample
into two strata. The improvement is somewhat limited by the fact that we are
still sampling in a region in which the function is 0 (although now slightly less
often).

A general stratiп¬Ѓed sample estimator is constructed as follows. We subdivide
the interval [0, 1] into convenient subintervals 0 = x0 < x1 < ...xk = 1, and
then select ni random variables uniform on the corresponding interval Vij в€ј
U [xiв€’1 , xi ], j = 1, 2, ..., ni . Then the estimator of Оё is

ni
k
X 1X
Л† (xi в€’ xiв€’1 ) (4.12)
Оёst = f (Vij ).
ni j=1
i=1

Once again the weights (xi в€’ xiв€’1 ) on the average of the function in the i0 th
Л†
interval are proportional to the lengths of these intervals and the estimator Оёst
is unbiased;

ni
k
X 1X
Л† (xi в€’ xiв€’1 )E{
E(Оёst ) = f (Vij )}
ni j=1
i=1
k
X
(xi в€’ xiв€’1 )Ef (Vi1 )
=
i=1
Z
k
X xi
1
(xi в€’ xiв€’1 )
= f (x) dx
xi в€’ xiв€’1
xiв€’1
i=1
Z 1
= f (x)dx = Оё.
0
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.229

In the case that all of the Vij are independent, the variance is given by:
k
X 1
Л† (xi в€’ xiв€’1 )2 var[f (Vi1 )]. (4.13)
var(Оёst ) =
ni
i=1

Once again, if we choose our intervals so that the variation within intervals
var[f (Vi1 )] is small, this provides a substantial improvement over crude Monte
Carlo. Suppose we wish to choose the sample sizes so as to minimize this
variance. Obviously to avoid inп¬Ѓnite sample sizes and to keep a ceiling on
costs, we need to impose a constraint on the total sample size, say

k
X
(4.14)
ni = n.
i
If we treat the parameters ni as continuous variables we can use the method of
Lagrange multipliers to solve
k
X 1
(xi в€’ xiв€’1 )2 var[f (Vi1 )]
min
ni
{ni }
i=1

subject to constraint (4.14).

It is easy to show that the optimal choice of sample sizes within intervals are
p
ni в€ќ (xi в€’ xiв€’1 ) var[f (Vi1 )]

or more precisely that
p
(xi в€’ xiв€’1 ) var[f (Vi1 )]
p (4.15)
ni = n Pk .
(xj в€’ xjв€’1 ) var[f (Vj1 )]
j=1

In practice,of course, this will not necessarily produce an integral value of ni
and so we are forced to round to the nearest integer. For this optimal choice of
sample size, the variance is now given by
q
k
1X
Л†
var(Оёst ) = { (xj в€’ xjв€’1 ) var[f (Vj1 )]}2
n j=1
p
Pk
(xj в€’ xjв€’1 ) var[f (Vj1 )] is a weighted average of the standard
The term j=1

deviation of the function f within the interval (xiв€’1 , xi ) and it is clear that,
230 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

at least for a continuous function, these standard deviations can be made small
simply by choosing k large with |xi в€’xiв€’1 | small. In other words if we ignore the
fact that the sample sizes must be integers, at least for a continuous function f ,
Л†
we can achieve arbitrarily small var(Оёst ) using a п¬Ѓxed sample size n simply by
stratifying into a very large number of (small) strata. The intervals should be
p
chosen so that the variances var[f (Vi1 )] are small. ni в€ќ (xi в€’xiв€’1 ) var[f (Vi1 )].
In summary, optimal sample sizes are proportional to the lengths of intervals
times the standard deviation of function evaluated at a uniform random variable
on the interval. For suп¬ѓciently small strata we can achieve arbitrarily small
variances. The following function was designed to accept the strata x1 , x2 , ..., xk
and the desired sample size n as input, and then determine optimal sample sizes
and the stratiп¬Ѓed sample estimator as follows:

1. Initially sample sizes of 1000 are chosen from each stratum and these are
p
used to estimate var[f (Vi1 )]

2. Approximately optimal sample sizes ni are then calculated from (4.15).

3. Samples of size ni are then taken and the stratiп¬Ѓed sample estimator
(4.12), its variance ( 4.13) and the sample sizes ni are output.

function [est,v,n]=stratiп¬Ѓed(x,nsample)

% function for optimal sample size stratiп¬Ѓed estimator on call option price example

%[est,v,n]=stratiп¬Ѓed([0 .6 .85 1],100000) uses three strata (0,.6),(.6 .85),(.85 1)

and total sample size 100000

est=0;

n=[];

m=length(x);

for i=1:m-1 % the preliminary sample of size 1000

v= var(callopt2(unifrnd(x(i),x(i+1),1,1000),10,10,.05,.2,.25));

n=[n (x(i+1)-x(i))*sqrt(v)];
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.231

end

n=п¬‚oor(nsample*n/sum(n)); %calculation of the optimal sample sizes, rounded

down

v=0;

for i=1:m-1

F=callopt2(unifrnd(x(i),x(i+1),1,n(i)),10,10,.05,.2,.25); %evaluate the function

f at n(i) uniform points in interval
est=est+(x(i+1)-x(i))*mean(F);

v=v+var(F)*(x(i+1)-x(i))^2/n(i);

end

A call to [est,v,n]=stratiп¬Ѓed([0 .6 .85 1],100000) for example generates a
stratiп¬Ѓed sample with three strata[0, 0.6], (0.6, 0.85], and (0.8, 1] and outputs
the estimate est = 0.4617, its variance v = 3.5 Г— 10в€’7 and the approximately
optimal choice of sample sizes n = 26855, 31358, 41785. To compare this with
a crude Monte Carlo estimator, note that a total of 99998 function evaluations
are used so the eп¬ѓciency gain is Пѓf /(99998 Г— 3.5 Г— 10в€’7 ) = 12.8. Evidently this
2

stratiп¬Ѓed random sample can account for an improvement in eп¬ѓciency of about
a factor of 13. Of course there is a little setup cost here (a preliminary sample
of size 3000) which we have not included in our calculation but the results of
that preliminary sample could have been combined with the main sample for a
very slight decrease in variance as well). For comparison, the function call
[est,v,n]=stratiп¬Ѓed([.47 .62 .75 .87 .96 1],1000000)

uses п¬Ѓve strata [.47 .62],[.62 .75], [.75, .87], [.87, .96], [.96, 1] and gives a
variance of the estimator of 7.4Г— 10в€’9 . Since a crude sample of the same size has
variance around 4.36 Г— 10в€’7 the eп¬ѓciency is about 170. This stratiп¬Ѓed sample
is as good as a crude Monte Carlo estimator with 170 million simulations! By
introducing more strata, we can increase this eп¬ѓciency as much as we wish.
Within a stratiп¬Ѓed random sample we may also introduce antithetic variates
designed to provide negative covariance. For example we may use antithetic
232 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

pairs within an interval if we believe that the function is monotone in the inter-
val, or if we believe that the function is increasing across adjacent strata, we can
introduce antithetic pairs between two intervals. For example, we may generate
U в€ј U nif orm[0, 1] and then sample the point Vij = xiв€’1 + (xi в€’ xiв€’1 )U from
the interval (xiв€’1 , xi ) as well as the point V(i+1)j = xi+1 в€’ (xi+1 в€’ xi )U from
the interval (xi , xi+1 ) to obtain antithetic pairs between intervals. For a simple
example of this applied to the above call option valuation, consider the estima-
tor based on three strata [0,.47),[0.47 0.84),[0.84 1]. Here we have not bothered
to sample to the left of 0.47 since the function is 0 there, so the sample size here
is set to 0. Then using antithetic random numbers within each of the two strata
[0.47 0.84),[0.84 1], and U в€ј U nif orm[0, 1] we obtain the estimator

0.37 0.16
Л† [f (.84+.16U )+ f (1 в€’ .16U )]
Оёstr,ant = [f (.47+ .37U ) +f (.84в€’ .37U )] +
2 2

To assess this estimator,
we evaluated, for U a vector of 1000000 uniform,

U=rand(1,1000000);

F=.37*.5*(fn(.47+.37*U)+fn(.84-.37*U))+.16*.5*(fn(.84+.16*U)+fn(1-.16*U));

mean(F) % gives 0.4615

% gives 1.46Г— 10в€’9
var(F)/length(F)

This should be compared with the crude Monte-Carlo estimator having the same
number n = 4Г— 106 function evaluations as each of the components of the vector
F : Пѓcrude /(4Г— 106 ) = 1.117Г— 10в€’7 . The gain in eп¬ѓciency is therefore 1.117/.0146
2

or approximately 77. The above stratiп¬Ѓed-antithetic simulation with 1,000,000
input variates and 4,000,000 function evaluations is equivalent to a crude Monte
Carlo simulation with sample size 308 million! Variance reduction makes the
diп¬Ђerence between a simulation that is feasible on a laptop and one that would
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.233

require a very long time on a mainframe computer. However on a Pentium IV
2.2GHZ laptop it took approximately 58 seconds to run.

Control Variates.

There are two techniques that permit using knowledge about a function with
shape similar to that of f . First, we consider the use of a control variate, based
on the trivial identity
Z Z Z
f (u)du = g(u)du + (f (u) в€’ g(u))du. (4.16)

for an arbitrary function g(u). Assume that the integral of g is known, so we
can substitute its known value for the п¬Ѓrst term above. The second integral we
assume is more diп¬ѓcult and we estimate it by crude Monte Carlo, resulting in
estimator
Z n
1X
Л† [f (Ui ) в€’ g(Ui )]. (4.17)
Оёcv = g(u)du +
n i=1
This estimator is clearly unbiased and has variance
n
X
Л†cv ) = var{ 1 [f (Ui ) в€’ g(Ui )]}
var(Оё
n i=1
var[f (U ) в€’ g(U )]
=
n
so the variance is reduced over that of crude Monte Carlo estimator having the
same sample size n by a factor
var[f (U )]
for U в€ј U [0, 1]. (4.18)
var[f (U ) в€’ g(U )]
Let us return to the example of pricing a call option. By some experimen-
tation, which could involve a preliminary crude simulation or simply evaluating
the function at various points, we discovered that the function

g(u) = 6[(u в€’ .47)+ ]2 + (u в€’ .47)+

provided a reasonable approximation to the function f (u). The two functions
are compared in Figure 4.4. Moreover, the integral 2 Г— 0.532 + 1 0.533 of the
2

function g(.) is easy to obtain.
234 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Figure 4.4: Comparison of the function f (u) and the control variate g(u)

It is obvious from the п¬Ѓgure that since f (u) в€’ g(u) is generally much smaller
and less variable than is f (u), var[f (U ) в€’ g(U )] < var(f (U )). The variance of
the crude Monte Carlo estimator is determined by the variability in the func-
tion f (u) over its full range. The variance of the control variate estimator is
determined by the variance of the diп¬Ђerence between the two functions, which
in this case is quite small. We used the following matlab functions, the п¬Ѓrst to
generate the function g(u) and the second to determine the eп¬ѓciency gain of
the control variate estimator;

% this is the function g(u), a control variate for fn(u)
function g=GG(u)

u=max(0,u-.47);

g=6*u.^2+u;

function [est,var1,var2]=control(f,g,intg,n)

% run using a statement like [est,var1,var2]=control(вЂ™fnвЂ™,вЂ™GGвЂ™,intg,n)

% runs a simulation on the function f using control variate g (both character
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.235

strings) n times.
R1
g(u)du
% intg is the integral of g % intg= 0

% outputs estimator est and variances var1,var2, variances with and without

control variate.

U=unifrnd(0,1,1,n);

% evaluates f (u) for vector u
FN=eval(strcat(f,вЂ™(U)вЂ™));

% evaluates g(u)
CN=eval(strcat(g,вЂ™(U)вЂ™));

est=intg+mean(FN-CN);

var1=var(FN);

var2=var(FN-CN);

Then the call [est,var1,var2]=control(вЂ™fnвЂ™,вЂ™GGвЂ™,2*(.53)^3+(.53)^2/2,1000000)
yields the estimate 0.4616 and variance=1.46 Г— 10в€’8 for an eп¬ѓciency gain over
crude Monte Carlo of around 30.
This elementary form of control variate suggests using the estimator
Z n
1X
[f (Ui ) в€’ g(Ui )]
g(u)du +
n i=1

but it may well be that g(U ) is not the best estimator we can imagine for f (U ).
We can often п¬Ѓnd a linear function of g(U ) which is better by using regression.
Since elementary regression yields

f (U ) в€’ E(f (U )) = ОІ(g(U ) в€’ E(g(U ))) + ВІ (4.19)

where
cov(f (U ), g(U ))
(4.20)
ОІ=
var(g(U ))
and the errors ВІ have expectation 0, it follows that E(f (U )) + ВІ = f (U ) в€’
ОІ[g(U ) в€’ E(g(U ))] and sof (U ) в€’ ОІ[g(U ) в€’ E(g(U ))] is an unbiased estimator
of E(f (U )). For a sample of N uniform random numbers this becomes
N
X
Л†cv = ОІE(g(U )) + 1 [f (Ui ) в€’ ОІg(Ui )]. (4.21)
Оё
N i=1
236 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Moreover this estimator having smallest variance among all linear combina-
tions of f (U )and g(U ). Note that when ОІ = 1 (4.21) reduces to the simpler
form of the control variate technique (4.17) discussed above. However, the lat-
ter is generally better in terms of maximizing eп¬ѓciency. Of course in practice
it is necessary to estimate the covariance and the variances in the deп¬Ѓnition of
ОІ from the simulations themselves by evaluating f and g at many diп¬Ђerent
uniform random variables Ui , i = 1, 2, ..., N and then estimating ОІ using the
standard least squares estimator

PN PN PN
f (Ui )g(Ui ) в€’ i=1 f (Ui ) i=1 g(Ui )
N
b i=1
ОІ= .
PN PN
N i=1 g 2 (Ui ) в€’ ( i=1 g(Ui ))2

b
Although in theory the substitution of an estimator ОІ for the true value ОІ
results in a small bias in the estimator, for large numbers of simulations N our
b
estimator ОІ is so close to the true value that this bias can be disregarded.

Importance Sampling.

A second technique that is similar is that of importance sampling. Again we
depend on having a reasonably simple function g that after muultiplication by
some constant, is similar to f. However, rather than attempt to minimize the
diп¬Ђerence f (u) в€’ g(u) between the two functions, we try and п¬Ѓnd g(u) such
that f (u)/g(u) is nearly a constant. We also require that g is non-negative
and can be integrated so that, after rescaling the function, it integrates to one,
i.e. it is a probability density function. Assume we can easily generate random
variables from the probability density function g(z). The distribution whose
probability density function is g(z), z в€€ [0, 1] is the importance distribution.
Note that if we generate a random variable Z having the probability density
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.237

function g(z), z в€€ [0, 1] then
Z Z 1
f (z)
f (u)du = g(z)dz
0 g(z)
Вё
f (Z)
(4.22)
=E .
g(Z)

This can therefore be estimated by generating independent random variables Zi
with probability density function g(z) and then setting
n
1 X f (Zi )
Л† (4.23)
Оёim = .
n i=1 g(Zi )

Once again, according to (4.22), this is an unbiased estimator and the variance
is
1 f (Z1 )
Л†
var{Оёim } = }. (4.24)
var{
n g(Z1 )
Returning to our example, we might consider using the same function as
before for g(u). However, it is not easy to generate variates from a density
proportional to this function g by inverse transform since this would require
solving a cubic equation. Instead, let us consider something much simpler, the
density function g(u) = 2(0.53)в€’2 (u в€’ .47)+ having cumulative distribution
function G(u) = (0.53)2 [(u в€’ .47)+ ]2 and inverse cumulative distribution func-
в€љ
tion Gв€’1 (u) = 0.47 + 0.53 u. In this case we generate Zi using Zi = Gв€’1 (Ui )
for Ui в€ј U nif orm[0, 1]. The following function simulates an importance sample
estimator:

function [est,v]=importance(f,g,Ginv,u)

%runs a simulation on the function вЂ™fвЂќ using importance density вЂќgвЂќ(both character

strings) and inverse c.d.f. вЂќGinverseвЂќ

% outputs all estimators (should be averaged) and variance.

% IM is the inverse cf of the importance distribution c.d.f.

% run e.g.

% [est,v]=importance(вЂ™fnвЂ™,вЂ™2*(IM-.47)/(.53)^2;вЂ™,вЂ™.47+.53*sqrt(u);вЂ™,rand(1,1000));
238 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

IM= eval(Ginv); %=.47+.53*sqrt(u);

%IMdens is the density of the importance sampling distribution at IM

IMdens=eval(g); %2*(IM-.47)/(.53)^2;

FN=eval(strcat(f,вЂ™(IM)вЂ™));

est=FN./IMdens; % mean(est) prrovides the estimator

v=var(FN./IMdens)/length(IM); % this is the variance of the estimator per sim-

ulation

The function was called with [est,v]=importance(вЂ™fnвЂ™,вЂ™2*(IM-.47)/(.53)^2;вЂ™,вЂ™.47+.53*sqrt(u);вЂ™,rand(1,
giving an estimate mean(est) = 0.4616 with variance 1 .28 Г— 10в€’8 for an
eп¬ѓciency gain of around 35 over crude Monte Carlo.

Example 36 (Estimating Quantiles using importance sampling.) Suppose we
are able to generate random variables X from a probability density function of
the form
fОё (x)

and we wish to estimate a quantile such as VAR, i.e. estimate xp such that

PОё0 (X В· xp ) = p

for a certain value Оё0 of the parameter.

As a very simple example suppose S is the sum of 10 independent random
variables having the exponential distribution with mean Оё, and fОё (x1 , ..., x10 ) is
the joint probability density function of these 10 observations. Assume Оё0 = 1
and p = .999 so that we seek an extreme quantile of the sum, i.e. we want to
determine xp such that PОё0 (S В· xp ) = p. The equation that we wish to solve
for xp is
EОё0 {I(S В· xp )} = p. (4.25)

The crudest estimator of this is obtained by generating a large number of
independent observations of S under the parameter value Оё0 = 1 and п¬Ѓnding
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.239

the pвЂ™th quantile, i.e. by deп¬Ѓning the empirical c.d.f.. We generate independent
random vectors Xi = (Xi1 , ..., Xi10 ) from the probability density fОё0 (x1 , ..., x10 )
P
and with Si = 10 Xij ,deп¬Ѓne
j=1
n
1X
b I(Si В· x). (4.26)
F (x) =
n i=1

Invert it (possibly with interpolation) to estimate the quantile

b
xp = F в€’1 (p).
c (4.27)

If the true cumulative distribution function is diп¬Ђerentiable, the variance of this
quantile estimator is asymptotically related to the variance of our estimator of
the cumulative distribution function,
b
var(F (xp ))
var(c) '
xp ,
(F 0 (xp ))2
so any variance reduction in the estimator of the c.d.f. us reп¬‚ected, at least
asymptotically, in a variance reduction in the estimator of the quantile. Using
importance sampling (4.25) is equivalent to the same technique but with
n
1X
b Wi I(Si В· x) where (4.28)
FI (x) =
n i=1
fОё0 (Xi1 , ..., Xi10 )
Wi =
fОё (Xi1 , ..., Xi10 )
b
Ideally we should choose the value of Оё so that the variance of xp or of

Wi I(Si В· xp )

is as small as possible. This requires a wise guess or experimentation with
various choices of Оё. For a given Оё we have another choice of empirical cumulative
distribution function
n
X
1
b
FI2 (x) = Pn Wi I(Si В· x). (4.29)
Wi
i=1 i=1

Both of these provide fairly crude estimates of the sample quantiles when obser-
vations are weighted and, as one does with the sample median, one could easily
interpolate between adjacent values around the value of xp .
240 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

The alternative (4.29) is motivated by the fact that the values Wi appear
as weights attached to the observations Si and it therefore seems reasonable to
divide by the sum of the weights. In fact the expected value of the denominator
is
n
X
EОё { Wi } = n
i=1
so the two denominators are similar. In the example where the Xij are
independent exponential(1) let us examine the weight on Si determined by
Xi = (Xi1 , ..., Xi10 ),
10
fОё0 (Xi1 , ..., Xi10 ) Y exp(в€’Xij )
= Оё10 exp{в€’Si (1 в€’ Оёв€’1 )}.
Wi = = в€’1 exp(в€’X /Оё)
fОё (Xi1 , ..., Xi10 ) Оё ij
j=1

The renormalized alternative (4.29) might be necessary for estimating extreme
quantiles when the number of simulations is small but only the п¬Ѓrst provides
an completely unbiased estimating function. In our case, using (4.28) with
Оё = 2.5 we obtained an estimator of F (x0.999 ) with eп¬ѓciency about 180 times
that of a crude Monte Carlo simulation. There is some discussion of various
renormalizations of the importance sampling weights in Hesterberg(1995).

Importance Sampling, the Exponential Tilt and the Saddlepoint Ap-
proximation

When searching for a convenient importance distribution, particularly if we
wish to increase or decrease the frequency of observations in the tails, it is
quite common to embed a given density in an exponential family. For example
suppose we wish to estimate an integral
Z
g(x)f (x)dx

where f (x) is a probability density function. Suppose K(s) denotes the cumu-
lant generating function (the logarithm of the moment generating function) of
the density f (x),i.e. if
Z
exs f (x)dx.
exp{K(s)} =
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.241

The cumulant generating function is a useful summary of the moments of a
distribution since the mean can be determined as K 0 (0) and the variance as
K 00 (0). From this single probability density function, we can now produce a
whole (exponential) family of densities

fОё (x) = eОёxв€’K(Оё) f (x) (4.30)

of which f (x) is a special case corresponding to Оё = 0. The density (4.30) is often
referred to as an exponential tilt of the original density function and increases
the weight in the right tail for Оё > 0, decreases it for Оё < 0.

This family of densities is closely related to the saddlepoint approximation.
If we wish to estimate the value of a probability density function f (x) at a par-
ticular point x, then note that this could be obtained from (4.30) if we knew
the probability density function fОё (x). On the other hand a normal approxi-
mation to a density is often reasonable at or around its mode, particularly if
we are interested in the density of a sum or an average of independent random
variables. The cumulant generating function of the density fОё (x) is easily seen
to be K(Оё + s) and the mean is therefore K 0 (Оё). If we choose the parameter
Оё = Оё(x) so that

K 0 (Оё) = x (4.31)

then the density fОё has mean x and variance K 00 (Оё). How do we know for a given
value of x there exists a solution to (4.31)? From the properties of cumulant
generating functions, K(t) is convex, increasing and K(0) = 0.This implies
that as t increases, the slope of the cumulant generating function K 0 (t) is non-
decreasing. It therefore approaches a limit xmax (п¬Ѓnite or inп¬Ѓnite) as t в†’ в€ћ and
as long as we restrict the value of x in (4.31) to the interval x < xmax we can
п¬Ѓnd a solution. The value of the N (x, K 00 (Оё)) at the value x is
s
1
fОё (x) в‰€
2ПЂK 00 (Оё)
242 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

and therefore the approximation to the density f (x) is
s
1
eK(Оё)в€’Оёx .
f (x) в‰€ (4.32)
00 (Оё)
2ПЂK
 стр. 1(всего 2)СОДЕРЖАНИЕ >>