<< стр. 8(всего 11)СОДЕРЖАНИЕ >>
Figure 4.4: Comparison of the function f (u) and the control variate g(u)

Let us return to the example of pricing a call option. By some experimen-
tation, which could involve a preliminary crude simulation or simply evaluating
the function at various points, we discovered that the function

g(u) = 6[(u в€’ .47)+ ]2 + (u в€’ .47)+

provided a reasonable approximation to the function f (u). The two functions
are compared in Figure 4.4. Moreover, the integral 2 Г— 0.532 + 1 0.533 of the
2

function g(.) is easy to obtain.
It is obvious from the п¬Ѓgure that since f (u) в€’ g(u) is generally much smaller
and less variable than is f (u), var[f (U ) в€’ g(U )] < var(f (U )). The variance of
the crude Monte Carlo estimator is determined by the variability in the func-
tion f (u) over its full range. The variance of the control variate estimator is
determined by the variance of the diп¬Ђerence between the two functions, which
in this case is quite small. We used the following matlab functions, the п¬Ѓrst to
generate the function g(u) and the second to determine the eп¬ѓciency gain of
the control variate estimator;
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.227

% this is the function g(u), a control variate for fn(u)
function g=GG(u)

u=max(0,u-.47);

g=6*u.^2+u;

function [est,var1,var2]=control(f,g,intg,n)

% run using a statement like [est,var1,var2]=control(вЂ™fnвЂ™,вЂ™GGвЂ™,intg,n)

% runs a simulation on the function f using control variate g (both character

strings) n times.
R1
g(u)du
% intg is the integral of g % intg= 0

% outputs estimator est and variances var1,var2, variances with and without

control variate.

U=unifrnd(0,1,1,n);

% evaluates f (u) for vector u
FN=eval(strcat(f,вЂ™(U)вЂ™));

% evaluates g(u)
CN=eval(strcat(g,вЂ™(U)вЂ™));

est=intg+mean(FN-CN);

var1=var(FN);

var2=var(FN-CN);

Then the call [est,var1,var2]=control(вЂ™fnвЂ™,вЂ™GGвЂ™,2*(.53)^3+(.53)^2/2,1000000)
yields the estimate 0.4616 and variance=1.46 Г— 10в€’8 for an eп¬ѓciency gain over
crude Monte Carlo of around 30.
This elementary form of control variate suggests using the estimator
Z n
1X
[f (Ui ) в€’ g(Ui )]
g(u)du +
n i=1

but it may well be that g(U ) is not the best estimator we can imagine for f (U ).
We can often п¬Ѓnd a linear function of g(U ) which is better by using regression.
Since elementary regression yields

f (U ) в€’ E(f (U )) = ОІ(g(U ) в€’ E(g(U ))) + ВІ (4.19)
228 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

where
cov(f (U ), g(U ))
(4.20)
ОІ=
var(g(U ))
and the errors ВІ have expectation 0, it follows that E(f (U )) + ВІ = f (U ) в€’
ОІ[g(U ) в€’ E(g(U ))] and sof (U ) в€’ ОІ[g(U ) в€’ E(g(U ))] is an unbiased estimator
of E(f (U )). For a sample of n uniform random numbers this becomes
n
1X
Л† [f (Ui ) в€’ ОІg(Ui )]. (4.21)
Оёcv = ОІE(g(U )) +
n i=1

Moreover this estimator having smallest variance among all linear combina-
tions of f (U )and g(U ). Note that when ОІ = 1 (4.21) reduces to the simpler
form of the control variate technique (4.17) discussed above. However, the lat-
ter is generally better in terms of maximizing eп¬ѓciency. Of course in practice
it is necessary to estimate the covariance and the variances in the deп¬Ѓnition of
ОІ from the simulations themselves by evaluating f and g at many diп¬Ђerent
uniform random variables Ui , i = 1, 2, ..., n and then estimating ОІ using the
standard least squares estimator
P P P
n n f (Ui )g(Ui ) в€’ n f (Ui ) n g(Ui )
b i=1 P i=1 i=1
P
ОІ= .
n 2 (U ) в€’ ( n g(U ))2
n i=1 g i i
i=1

b
Although in theory the substitution of an estimator ОІ for the true value ОІ
results in a small bias in the estimator, for large numbers of simulations n our
b
estimator ОІ is so close to the true value that this bias can be disregarded.

Importance Sampling.

A second technique that is similar is that of importance sampling. Again we
depend on having a reasonably simple function g that after muultiplication by
some constant, is similar to f. However, rather than attempt to minimize the
diп¬Ђerence f (u) в€’ g(u) between the two functions, we try and п¬Ѓnd g(u) such
that f (u)/g(u) is nearly a constant. We also require that g is non-negative
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.229

and can be integrated so that, after rescaling the function, it integrates to one,
i.e. it is a probability density function. Assume we can easily generate random
variables from the probability density function g(z). The distribution whose
probability density function is g(z), z в€€ [0, 1] is the importance distribution.
Note that if we generate a random variable Z having the probability density
function g(z), z в€€ [0, 1] then
Z Z 1
f (z)
f (u)du = g(z)dz
0 g(z)
Вё
f (Z)
(4.22)
=E .
g(Z)

This can therefore be estimated by generating independent random variables Zi
with probability density function g(z) and then setting
n
1 X f (Zi )
Л† (4.23)
Оёim = .
n i=1 g(Zi )

Once again, according to (4.22), this is an unbiased estimator and the variance
is
1 f (Z1 )
Л†
var{Оёim } = }. (4.24)
var{
n g(Z1 )

Returning to our example, we might consider using the same function as
before for g(u). However, it is not easy to generate variates from a density
proportional to this function g by inverse transform since this would require
solving a cubic equation. Instead, let us consider something much simpler, the
density function g(u) = 2(0.53)в€’2 (u в€’ .47)+ having cumulative distribution
function G(u) = (0.53)2 [(u в€’ .47)+ ]2 and inverse cumulative distribution func-
в€љ
tion Gв€’1 (u) = 0.47 + 0.53 u. In this case we generate Zi using Zi = Gв€’1 (Ui )
for Ui в€ј U nif orm[0, 1]. The following function simulates an importance sample
estimator:

function [est,v]=importance(f,g,Ginv,u)
230 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

%runs a simulation on the function вЂ™fвЂќ using importance density вЂќgвЂќ(both character

strings) and inverse c.d.f. вЂќGinverseвЂќ

% outputs all estimators (should be averaged) and variance.

% IM is the inverse cf of the importance distribution c.d.f.

% run e.g.

% [est,v]=importance(вЂ™fnвЂ™,вЂ™2*(IM-.47)/(.53)^2;вЂ™,вЂ™.47+.53*sqrt(u);вЂ™,rand(1,1000));

IM= eval(Ginv); %=.47+.53*sqrt(u);

%IMdens is the density of the importance sampling distribution at IM

IMdens=eval(g); %2*(IM-.47)/(.53)^2;

FN=eval(strcat(f,вЂ™(IM)вЂ™));

est=FN./IMdens; % mean(est) prrovides the estimator

v=var(FN./IMdens)/length(IM); % this is the variance of the estimator per sim-

ulation

The function was called with [est,v]=importance(вЂ™fnвЂ™,вЂ™2*(IM-.47)/(.53)^2;вЂ™,вЂ™.47+.53*sqrt(u);вЂ™,rand(1,
giving an estimate mean(est) = 0.4616 with variance 1 .28 Г— 10в€’8 for an
eп¬ѓciency gain of around 35 over crude Monte Carlo.

Example 36 (Estimating Quantiles using importance sampling.) Suppose we
are able to generate random variables X from a probability density function of
the form
fОё (x)

and we wish to estimate a quantile such as VAR, i.e. estimate xp such that

PОё0 (X В· xp ) = p

for a certain value Оё0 of the parameter.

As a very simple example suppose S is the sum of 10 independent random
variables having the exponential distribution with mean Оё, and fОё (x1 , ..., x10 ) is
the joint probability density function of these 10 observations. Assume Оё0 = 1
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.231

and p = .999 so that we seek an extreme quantile of the sum, i.e. we want to
determine xp such that PОё0 (S В· xp ) = p. The equation that we wish to solve
for xp is
EОё0 {I(S В· xp )} = p. (4.25)

The crudest estimator of this is obtained by generating a large number of
independent observations of S under the parameter value Оё0 = 1 and п¬Ѓnding
the pвЂ™th quantile, i.e. by deп¬Ѓning the empirical c.d.f.. We generate independent
random vectors Xi = (Xi1 , ..., Xi10 ) from the probability density fОё0 (x1 , ..., x10 )
P
and with Si = 10 Xij ,deп¬Ѓne
j=1

n
1X
b I(Si В· x). (4.26)
F (x) =
n i=1

Invert it (possibly with interpolation) to estimate the quantile

b
xp = F в€’1 (p).
c (4.27)

If the true cumulative distribution function is diп¬Ђerentiable, the variance of this
quantile estimator is asymptotically related to the variance of our estimator of
the cumulative distribution function,

b
var(F (xp ))
var(c) '
xp ,
(F 0 (xp ))2

so any variance reduction in the estimator of the c.d.f. us reп¬‚ected, at least
asymptotically, in a variance reduction in the estimator of the quantile. Using
importance sampling (4.25) is equivalent to the same technique but with
n
X
bI (x) = 1 Wi I(Si В· x) where (4.28)
F
n i=1
fОё0 (Xi1 , ..., Xi10 )
Wi =
fОё (Xi1 , ..., Xi10 )

b
Ideally we should choose the value of Оё so that the variance of xp or of

Wi I(Si В· xp )
232 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

is as small as possible. This requires a wise guess or experimentation with
various choices of Оё. For a given Оё we have another choice of empirical cumulative
distribution function
n
X
1
b
FI2 (x) = Pn Wi I(Si В· x). (4.29)
i=1 Wi i=1

Both of these provide fairly crude estimates of the sample quantiles when obser-
vations are weighted and, as one does with the sample median, one could easily
interpolate between adjacent values around the value of xp .
The alternative (4.29) is motivated by the fact that the values Wi appear
as weights attached to the observations Si and it therefore seems reasonable to
divide by the sum of the weights. In fact the expected value of the denominator
is
n
X
EОё { Wi } = n
i=1
so the two denominators are similar. In the example where the Xij are
independent exponential(1) let us examine the weight on Si determined by
Xi = (Xi1 , ..., Xi10 ),
10
fОё0 (Xi1 , ..., Xi10 ) Y exp(в€’Xij )
= Оё10 exp{в€’Si (1 в€’ Оёв€’1 )}.
Wi = = в€’1 exp(в€’X /Оё)
fОё (Xi1 , ..., Xi10 ) Оё ij
j=1

The renormalized alternative (4.29) might be necessary for estimating extreme
quantiles when the number of simulations is small but only the п¬Ѓrst provides
an completely unbiased estimating function. In our case, using (4.28) with
Оё = 2.5 we obtained an estimator of F (x0.999 ) with eп¬ѓciency about 180 times
that of a crude Monte Carlo simulation. There is some discussion of various
renormalizations of the importance sampling weights in Hesterberg(1995).

Importance Sampling, the Exponential Tilt and the Saddlepoint Ap-
proximation

When searching for a convenient importance distribution, particularly if we
wish to increase or decrease the frequency of observations in the tails, it is
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.233

quite common to embed a given density in an exponential family. For example
suppose we wish to estimate an integral
Z
g(x)f (x)dx

where f (x) is a probability density function. Suppose K(s) denotes the cumu-
lant generating function (the logarithm of the moment generating function) of
the density f (x),i.e. if
Z
exs f (x)dx.
exp{K(s)} =

The cumulant generating function is a useful summary of the moments of a
distribution since the mean can be determined as K 0 (0) and the variance as
K 00 (0). From this single probability density function, we can now produce a
whole (exponential) family of densities

fОё (x) = eОёxв€’K(Оё) f (x) (4.30)

of which f (x) is a special case corresponding to Оё = 0. The density (4.30) is often
referred to as an exponential tilt of the original density function and increases
the weight in the right tail for Оё > 0, decreases it for Оё < 0.
This family of densities is closely related to the saddlepoint approximation.
If we wish to estimate the value of a probability density function f (x) at a par-
ticular point x, then note that this could be obtained from (4.30) if we knew
the probability density function fОё (x). On the other hand a normal approxi-
mation to a density is often reasonable at or around its mode, particularly if
we are interested in the density of a sum or an average of independent random
variables. The cumulant generating function of the density fОё (x) is easily seen
to be K(Оё + s) and the mean is therefore K 0 (Оё). If we choose the parameter
Оё = Оё(x) so that
K 0 (Оё) = x (4.31)

then the density fОё has mean x and variance K 00 (Оё). How do we know for a given
value of x there exists a solution to (4.31)? From the properties of cumulant
234 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

generating functions, K(t) is convex, increasing and K(0) = 0. This implies
that as t increases, the slope of the cumulant generating function K 0 (t) is non-
decreasing. It therefore approaches a limit xmax (п¬Ѓnite or inп¬Ѓnite) as t в†’ в€ћ and
as long as we restrict the value of x in (4.31) to the interval x < xmax we can
п¬Ѓnd a solution. The value of the N (x, K 00 (Оё)) at the value x is
s
1
fОё (x) в‰€
2ПЂK 00 (Оё)

and therefore the approximation to the density f (x) is
s
1
eK(Оё)в€’Оёx .
f (x) в‰€ (4.32)
00 (Оё)
2ПЂK

where Оё = Оё(x) satisп¬Ѓes K 0 (Оё) = x.

This is the saddlepoint approximation, discovered by Daniels (1954, 1980), and
usually applied to the distribution of sums or averages of independent random
variables because then the normal approximation is better motivated. Indeed,
the saddlepoint approximation to the distribution of the sum of n independent
identically distributed random variables is accurate to order O(nв€’1 ) and if we
renormalize it to integrate to one, accuracy to order O(nв€’3/2 ) is possible, sub-
stantially better than the order O(nв€’1/2 ) of of the usual normal approximation.
Consider, for example, the saddlepoint approximation to the Gamma(О±, 1)
distribution. Because the moment generating function of the Gamma(О±, 1) dis-
tribution is
1
m(t) = , t < 1,
(1 в€’ t)О±
the cumulant generating function is

K(t) = ln(m(t)) = в€’О± ln(1 в€’ t),
О±
K 0 (Оё) = x implies Оё(x) = 1 в€’ and
x
x2
О±
00 00
so that K (Оё(x)) =
K (Оё) = .
(1 в€’ Оё)2 О±
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.235

Therefore the saddlepoint approximation to the probability density function is
r
О± О±
f (x) ' exp{в€’О± ln(О±/x) в€’ x(1 в€’ )}
2
2ПЂx x
r
1 1 в€’О± О± О±в€’1
= О±2 e x exp(в€’x).
2ПЂ
This is exactly the gamma density function with StirlingвЂ™s approximation replac-
ing О“(О±) and after renormalization this is exactly the Gamma density function.
Since it is often computationally expensive to generate random variables
whose distribution is a convolution of known densities, it is interesting to ask
whether (4.32) makes this any easier. In many cases the saddlepoint approxi-
mation can be used to generate a random variable whose distribution is close
to this convolution with high eп¬ѓciency. For example suppose that we wish to
Pn
generate the random variable Sn = i=1 Xi where each random variable Xi
has the non-central chi-squared distribution with cumulant generating function
2О»t p
в€’ ln(1 в€’ 2t). (4.33)
K(t) =
1 в€’ 2t 2
The parameter О» is the non-centrality parameter of the distribution and p is
the degrees of freedom. Notice that the cumulant generating function of the sum
takes the same form but with (О», p) replaced by (nО», np) so in eп¬Ђect we wish
to generate a random variable with cumulant generating function (4.33) for
large values of the parameters (О», p). In stead we generate from the saddlepoint
approximation (4.32) to this distribution and in fact we do this indirectly. If we
change variable in (4.32) to determine the density of the new random variable
О˜ which solves the equation
K 0 (О˜) = X

then the saddlepoint approximation (4.32) is equivalent to specifying a proba-
bility density for this variable,
dx
fО˜ (Оё) = f (K 0 (Оё))
dОё
p 0
= constant Г— K 00 (Оё)eK(Оё)в€’ОёK (Оё) . (4.34)
236 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

In general, this probability density function can often be bounded above by
some density over the range of possible values of Оё allowing us to generate О˜
by acceptance rejection. Then the value of the random variable is X = K 0 (О˜).
In the particular case of the non-central chi-squared example above, swe may
take the dominating density to be the U [0, 1 ] density since (4.34) is bounded.
2

Combining Monte Carlo Estimators.

We have now seen a number of diп¬Ђerent variance reduction techniques and there
are many more possible. With many of these methods such as importance and
stratiп¬Ѓed sampling are associated parameters which may be chosen in diп¬Ђerent
ways. The variance formula may be used as a basis of choosing a вЂњbestвЂќ method
but these variances and eп¬ѓciencies must also estimated from the simulation and
it is rarely clear a priori which sampling procedure and estimator is best. For
example if a function f is monotone on [0, 1] then an antithetic variate can be
introduced with an estimator of the form

1
Л†
Оёa1 = [f (U ) + f (1 в€’ U )], U в€ј U [0, 1] (4.35)
2
1
but if the function is increasing to a maximum somewhere around and then
2

decreasing thereafter we might prefer

1
Л†
Оёa2 = [f (U/2) + f ((1 в€’ U )/2) + f ((1 + U )/2) + f (1 в€’ U/2)]. (4.36)
4

Notice that any weighted average of these two unbiased estimators of Оё would
also provide an unbiased estimator of Оё. The large number of potential variance
reduction techniques is an embarrassment of riches. Which variance reduction
methods we should use and how will we know whether it is better than the
competitors? Fortunately, the answer is often to use вЂњall of the methodsвЂќ (within
reason of course); that choosing a single method is often neither necessary nor
desirable. Rather it is preferable to use a weighted average of the available
estimators with the optimal choice of the weights provided by regression.
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.237

b
Suppose in general that we have k estimators or statistics Оёi , i = 1, ..k, all
b
unbiased estimators of the same parameter Оё so that E(Оёi ) = Оё for all i . In
b b
vector notation, letting О˜0 = (Оё1 , ..., Оёk ), we write E(О˜) = 1Оё where 1 is the
k-dimensional column vector of ones so that 10 = (1, 1, ..., 1). Let us suppose for
the moment that we know the variance-covariance matrix V of the vector О˜,
deп¬Ѓned by
bb
Vij = cov(Оёi , Оёj ).

Theorem 37 (best linear combinations of estimators)
b
The linear combination of the Оёi which provides an unbiased estimator of Оё and
has minimum variance among all linear unbiased estimators is

X
b b (4.37)
Оёblc = bi Оёi
i

where the vector b = (b1 , ..., bk )0 is given by

b = (1t V в€’1 1)в€’1 V в€’1 1.

The variance of the resulting estimator is

b
var(Оёblc ) = bt V b = 1/(1t V в€’1 1)

Proof. The proof is straightforward. It is easy to see that for any linear
combination (4.37) the variance of the estimator is

bt V b

and we wish to minimize this quadratic form as a function of b subject to the
constraint that the coeп¬ѓcients add to one, or that

b0 1 =1.

Introducing the Lagrangian, we wish to set the derivatives with respect to
238 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

the components bi equal to zero

в€‚ 0
{bt V b + О»(b 1в€’1)} = 0 or
в€‚b
2V b + О»1= 0

b=constant Г— V в€’1 1

and upon requiring that the coeп¬ѓcients add to one, we discover the value of the
constant above is (1t V в€’1 1)в€’1 .

This theorem indicates that the ideal linear combination of estimators has
coeп¬ѓcients proportional to the row sums of the inverse covariance matrix. No-
b
tably, the variance of a particular estimator Оёi is an ingredient in that sum,
but one of many. In practice, of course, we almost never know the variance-
covariance matrix V of a vector of estimators О˜. However, when we do simula-
tion evaluating these estimators using the same uniform input to each, we obtain
independent replicated values of О˜. This permits us to estimate the covariance
matrix V and since we typically conduct many simulations this estimate can be
very accurate. Let us suppose that we have n simulated values of the vectors О˜,
and call these О˜1 , ..., О˜n . As usual we estimate the covariance matrix V using
the sample covariance matrix

n
1X
b (О˜i в€’ О˜)(О˜i в€’ О˜)0
V=
n в€’ 1 i=1

where

n
1X
О˜= О˜i .
n i=1

Let us return to the example and attempt to п¬Ѓnd the best combination of
the many estimators we have considered so far. To this end, let
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.239

0.53
b [f (.47 + .53u) + f (1 в€’ .53u)] an antithetic estimator,
Оё1 =
2
0.37 0.16
b [f (.47 + .37u) + f (.84 в€’ .37u)] + [f (.84 + .16u) + f (1 в€’ .16u)],
Оё2 =
2 2
b
Оё3 = 0.37[f (.47 + .37u)] + 0.16[f (1 в€’ .16u)], (stratiп¬Ѓed-antithetic)
Z
b
Оё4 = g(x)dx + [f (u) в€’ g(u)], (control variate)

b Л†
Оё5 = Оёim , the importance sampling estimator (4.23).

b b b
Then Оё2 , and Оё3 are both stratiп¬Ѓed-antithetic estimators, Оё4 is a control
b
variate estimator and Оё5 the importance sampling estimator discussed earlier, all
obtained from a single input uniform random variate U. In order to determine
the optimal linear combination we need to generate simulated values of all 5
estimators using the same uniform random numbers as inputs. We determine
the best linear combination of these estimators using

function [o,v,b,V]=optimal(U)

% generates optimal linear combination of п¬Ѓve estimators and outputs

% average estimator, variance and weights

% input U a row vector of U[0,1] random numbers

T1=(.53/2)*(fn(.47+.53*U)+fn(1-.53*U));

T2=.37*.5*(fn(.47+.37*U)+fn(.84-.37*U))+.16*.5*(fn(.84+.16*U)+fn(1-.16*U));

T3=.37*fn(.47+.37*U)+.16*fn(1-.16*U);

intg=2*(.53)^3+.53^2/2;

T4=intg+fn(U)-GG(U);

T5=importance(вЂ™fnвЂ™,U);

X=[T1вЂ™ T2вЂ™ T3вЂ™ T4вЂ™ T5вЂ™]; % matrix whose columns are replications of the same

estimator, a row=5 estimators using same U

mean(X)

V=cov(X); % this estimates the covariance matrix V
240 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

on=ones(5,1);

V1=inv(V); % the inverse of the covariance matrix

b=V1*on/(onвЂ™*V1*on); % vector of coeп¬ѓcients of the optimal linear combination

o=mean(X*b); % vector of the optimal linear combinations

v=1/(onвЂ™*V1*on); % variance of the optimal linear combination based on

a single U

One run of this estimator, called with [o,v,b,V]= optimal(unifrnd(0,1,1,1000000))
yields

o = 0.4615

b0 = [в€’0.5499 1.4478 0.1011 0.0491 в€’ 0.0481].

The estimate 0.4615 is accurate to at least four decimals which is not surprising
since the variance per uniform random number input is v = 1.13 Г— 10в€’5 . In
other words, the variance of the mean based on 1,000,000 uniform input is
1.13Г— 10в€’10 , the standard error is around .00001 so we can expect accuracy to at
least 4 decimal places. Note that some of the weights are negative and others are
greater than one. Do these negative weights indicate estimators that are worse
than useless? The eп¬Ђect of some estimators may be, on subtraction, to render the
remaining function more linear and more easily estimated using another method
and negative coeп¬ѓcients are quite common in regression generally. The eп¬ѓciency
gain over crude Monte Carlo is an extraordinary 40,000. However since there
are 10 function evaluations for each uniform variate input, the eп¬ѓciency when
we adjust for the number of function evaluations is 4,000. This simulation
using 1,000,000 uniform random numbers and taking a 63 seconds on a Pentium
IV (2.4 GHz) (including the time required to generate all п¬Ѓve estimators) is
equivalent to forty billion simulations by crude Monte Carlo, a major task on a
supercomputer!
If we intended to use this simulation method repeatedly, we might well wish
to see whether some of the estimators can be omitted without too much loss
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.241

of information. Since the variance of the optimal estimator is 1/(1t V в€’1 1), we
might use this to attempt to select one of the estimators for deletion. Notice that
it is not so much the covariance of the estimators V which enters into Theorem
35 but its inverse J = V в€’1 which we can consider a type of information matrix
by analogy to maximum likelihood theory. For example we could choose to delete
the i0 th estimator, i.e. delete the i0 th row and column of V where i is chosen
PP
to have the smallest eп¬Ђect on 1/(1t V в€’1 1) or its reciprocal 1t J1 = i j Jij .
with the i0 th row and column
In particular, if we let V(i) be the matrix V
в€’1
deleted and J(i) = V(i) , then we can identify 1t J1 в€’ 1t J(i) 1 as the loss of
information when the i0 th estimator is deleted. Since not all estimators have
the same number of function evaluations, we should adjust this information
by F E(i) =number of function evaluations required by the i0 th estimator. In
other words, if an estimator i is to be deleted, it should be the one corresponding
to
1t J1 в€’ 1t J(i) 1
min{ }.
F E(i)
i

We should drop this i0 th estimator if the minimum is less than the information
per function evaluation in the combined estimator, because this means we will
increase the information available in our simulation per function evaluation.
In the above example with all п¬Ѓve estimators included, 1t J1 = 88757 (with
10 function evaluations per uniform variate) so the information per function
evaluation is 8, 876.

1t J1в€’1t J(i) 1
1t J1 в€’ 1t J(i) 1
i F E(i) F E(i)

1 88,048 2 44024
2 87,989 4 21,997
3 28,017 2 14,008
4 55,725 1 55,725
5 32,323 1 32,323

In this case, if we were to eliminate one of the estimators, our choice would
242 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

likely be number 3 since it contributes the least information per function eval-
uation. However, since all contribute more than 8, 876 per function evaluation,
we should likely retain all п¬Ѓve.

Common Random Numbers.

We now discuss another variance reduction technique, closely related to anti-
thetic variates called common random numbers, used for example whenever
we wish to estimate the diп¬Ђerence in performance between two systems or any
other variable involving a diп¬Ђerence such as a slope of a function.

bb
Example 38 For a simple example suppose we have two estimators Оё1 , Оё2 of
the вЂњcenterвЂќ of a symmetric distribution. We would like to know which of these
estimators is better in the sense that it has smaller variance when applied to
a sample from a speciп¬Ѓc distribution symmetric about its median. If both esti-
mators are unbiased estimators of the median, then the п¬Ѓrst estimator is better
if
b b
var(Оё1 ) < var(Оё2 )

and so we are interested in estimating a quantity like

Eh1 (X) в€’ Eh2 (X)

where X is a vector representing a sample from the distribution and h1 (X) =
b2 b2
Оё1 , h2 (X) = Оё2 . There are at least two ways of estimating these diп¬Ђerences;

1. Generate samples and hence values of h1 (Xi ), i = 1, ..., n and Eh2 (Xj ), j =
1, 2, ..., m independently and use the estimator
n m
1X 1X
h1 (Xi ) в€’ h2 (Xj ).
n i=1 m j=1

2. Generate samples and hence values of h1 (Xi ), h2 (Xi ), i = 1, ..., n inde-
pendently and use the estimator
n
1X
(h1 (Xi ) в€’ h2 (Xi )).
n i=1
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.243

It seems intuitive that the second method is preferable since it removes
the variability due to the particular sample from the comparison. This is a
common type of problem in which we want to estimate the diп¬Ђerence between
two expected values. For example we may be considering investing in a new
piece of equipment that will speed up processing at one node of a network and
we wish to estimate the expected improvement in performance between the new
system and the old. In general, suppose that we wish to estimate the diп¬Ђerence
between two expectations, say

Eh1 (X) в€’ Eh2 (Y ) (4.38)

where the random variable or vector X has cumulative distribution function FX
and Y has cumulative distribution function FY . Notice that the variance of a
Monte Carlo estimator

var[h1 (X) в€’ h2 (Y )] = var[h1 (X)] + var[h2 (Y )] в€’ 2cov{h1 (X), h2 (Y )} (4.39)

is small if we can induce a high degree of positive correlation between the gen-
erated random variables X and Y . This is precisely the opposite problem that
led to antithetic random numbers, where we wished to induce a high degree
of negative correlation. The following lemma is due to Hoeп¬Ђding (1940) and
provides a useful bound on the joint cumulative distribution function of two
random variables X and Y. Suppose X, Y have cumulative distribution func-
tions FX (x) and FY (y) respectively and joint cumulative distribution function
G(x, y) = P [X В· x, Y В· y].

Lemma 39 (a) The joint cumulative distribution function G of (X, Y ) always
satisп¬Ѓes

(FX (x) + FY (y) в€’ 1)+ В· G(x, y) В· min(FX (x), FY (y)) (4.40)

for all x, y .
244 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

(b) Assume that FX and FY are continuous functions. In the case that
в€’1 в€’1
X = FX (U ) and Y = FY (U ) for U uniform on [0, 1], equality is achieved
в€’1
on the right G(x, y) = min(FX (x), FY (y)). In the case that X = FX (U ) and
в€’1
Y = FY (1 в€’ U ) there is equality on the left; (FX (x) + FY (y) в€’ 1)+ = G(x, y).

Proof. (a) Note that

P [X В· x, Y В· y] В· P [X В· x] and similarly

В· P [Y В· y].

This shows that

G(x, y) В· min(FX (x), FY (y)),

verifying the right side of (4.40). Similarly for the left side

P [X В· x, Y В· y] = P [X В· x] в€’ P [X В· x, Y > y]

в‰Ґ P [X В· x] в€’ P [Y > y]

= FX (x) в€’ (1 в€’ FY (y))

= (FX (x) + FY (y) в€’ 1).

Since it is also non-negative the left side follows.
в€’1 в€’1
For (b) suppose X = FX (U ) and Y = FY (U ), then

в€’1 в€’1
P [X В· x, Y В· y] = P [FX (U ) В· x, FY (U ) В· y]

= P [U В· FX (x), U В· FY (y)]

since P [X = x] = 0 and P [Y = y] = 0.

But

P [U В· FX (x), U В· FY (y)] = min(FX (x), FY (y))

verifying the equality on the right of (4.40) for common random numbers. By
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.245

a similar argument,

в€’1 в€’1
P [FX (U ) В· x, FY (1 в€’ U ) В· y] = P [U В· FX (x), 1 в€’ U В· FY (y)]

= P [U В· FX (x), U в‰Ґ 1 в€’ FY (y)]

= (FX (x) в€’ (1 в€’ FY (y)))+

verifying the equality on the left.

The following theorem supports the use of common random numbers to
maximize covariance and antithetic random numbers to minimize covariance.

Theorem 40 (maximum/minimum covariance)
Suppose h1 and h2 are both non-decreasing (or both non-increasing) functions.
Subject to the constraint that X, Y have cumulative distribution functions FX , FY
respectively, the covariance

cov[h1 (X), h2 (Y )]

в€’1 в€’1
is maximized when Y = FY (U ) and X = FX (U ) (i.e. for common uniform[0, 1]
в€’1 в€’1
random numbers) and is minimized when Y = FY (U ) and X = FX (1 в€’ U )
(i.e. for antithetic random numbers).

Proof. We will sketch a proof of the theorem when the distributions are
all continuous and h1 , h2 are diп¬Ђerentiable. Deп¬Ѓne G(x, y) = P [X В· x, Y В· y].
The following representation of covariance is useful: deп¬Ѓne

H(x, y) = P (X > x, Y > y) в€’ P (X > x)P (Y > y) (4.41)

= G(x, y) в€’ FX (x)FY (y).
246 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Notice that, using integration by parts,
Zв€ћZв€ћ
H(x, y)h0 (x)h0 (y)dxdy
1 2
в€’в€ћ в€’в€ћ
Zв€ћZв€ћ
в€‚
H(x, y)h1 (x)h0 (y)dxdy
=в€’ 2
в€’в€ћ в€’в€ћ в€‚x
Zв€ћZв€ћ
в€‚2
= H(x, y)h1 (x)h2 (y)dxdy
в€’в€ћ в€’в€ћ в€‚xв€‚y
Zв€ћZв€ћ Zв€ћ Z в€ћ
h1 (x)h2 (y)g(x, y)dxdy в€’
= h1 (x)fX (x)dx h2 (y)fY (y)dy
в€’в€ћ в€’в€ћ в€’в€ћ в€’в€ћ

(4.42)
= cov(h1 (X), h2 (Y ))

where g(x, y), fX (x), fY (y) denote the joint probability density function, the
probability density function of X and that of Y respectively. In fact this result
holds in general even without the assumption that the distributions are contin-
uous. The covariance between h1 (X) and h2 (Y ), for h1 and h2 diп¬Ђerentiable
functions, is
Z Z
в€ћ в€ћ
H(x, y)h0 (x)h0 (y)dxdy.
cov(h1 (X), h2 (Y )) = 1 2
в€’в€ћ в€’в€ћ

The formula shows that to maximize the covariance, if h1 , h2 are both increasing
or both decreasing functions, it is suп¬ѓcient to maximize H(x, y) for each x, y
since h0 (x), h0 (y) are both non-negative. Since we are constraining the mar-
1 2

ginal cumulative distribution functions FX , FY , this is equivalent to maximizing
G(x, y) subject to the constraints

lim G(x, y) = FX (x)
yв†’в€ћ

lim G(x, y) = FY (y).
xв†’в€ћ

Lemma 37 shows that the maximum is achieved when common random numbers
are used and the minimum achieved when we use antithetic random numbers.

We can argue intuitively for the use of common random numbers in the case
of a discrete distribution with probability on the points indicated in Figure 4.5.
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.247

This п¬Ѓgure corresponds to a joint distribution with the following probabilities,
say
0 0.25 0.25 0.75 0.75 1
x
0 0.25 0.75 0.25 0.75 1
y
.1 .2 .2 .1 .2 .2
P [X = x, Y = y]
Suppose we wish to maximize P [X > x, Y > y] subject to the constraint that
the probabilities P [X > x] and P [Y > y] are п¬Ѓxed. We have indicated
arbitrary п¬Ѓxed values of (x, y) in the п¬Ѓgure. Note that if there is any weight
attached to the point in the lower right quadrant (labelled вЂњP2 вЂќ), some or all
of this weight can be reassigned to the point P3 in the lower left quadrant
provided there is an equal movement of weight from the upper left P4 to the
upper right P1 . Such a movement of weight will increase the value of G(x, y)
without aп¬Ђecting P [X В· x] or P [Y В· y]. The weight that we are able to transfer
in this example is 0.1, the minimum of the weights on P4 and P2 . In general,
this continues until there is no weight in one of the oп¬Ђ-diagonal quadrants for
every choice of (x, y). The resulting distribution in this example is given by

0 0.25 0.25 0.75 0.75 1
x
0 0.25 0.75 0.25 0.75 1
y
.1 .3 0 .1 .3 .2
P [X = x, Y = y]

and it is easy to see that such a joint distribution can be generated from common
в€’1 в€’1
random numbers X = FX (U ), Y = FY (U ).

Conditioning

We now consider a simple but powerful generalization of control variates. Sup-
pose that we can decompose a random variable T into two components T1 , Оµ

(4.43)
T = T1 + Оµ

so that T1 , Оµ are uncorrelated

cov(T1 , Оµ) = 0.
248 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Figure 4.5: Changing weights on points to maximize covariance

Assume as well that E(Оµ) = 0. Regression is one method for determining such
a decomposition and the error term Оµ in regression satisп¬Ѓes these conditions.
Then T1 has the same mean as T and it is easy to see that

var(T ) = var(T1 ) + var(Оµ)

so T1 as smaller variance than T (unless Оµ = 0 with probability 1). This means
that if we wish to estimate the common mean of T or T1 , the estimator T1 is
preferable, since it has the same mean with smaller variance.

One special case is variance reduction by conditioning. For the standard
deп¬Ѓnition and properties of conditional expectation see the appendix. One com-
mon deп¬Ѓnition of E[X|Y ] is the unique (with probability one) function g(y)
of Y which minimizes E{X в€’ g(Y )}2 . This deп¬Ѓnition only applies to random
variables X which have п¬Ѓnite variance and so this deп¬Ѓnition requires some mod-
iп¬Ѓcation when E(X 2 ) = в€ћ, but we will assume here that all random variables,
say X, Y, Z have п¬Ѓnite variances. We can deп¬Ѓne conditional covariance using
conditional expectation as

cov(X, Y |Z) = E[XY |Z] в€’ E[X|Z]E[Y |Z]
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.249

and conditional variance:

var(X|Z) = E(X 2 |Z) в€’ (E[X|Z])2 .

The variance reduction through conditioning is justiп¬Ѓed by the following well-
known result:

Theorem 41 (a)E(X) = E{E[X|Y ]}
(b) cov(X, Y ) = E{cov(X, Y |Z)} + cov{E[X|Z], E[Y |Z]}
(c) var(X) = E{var(X|Z)} + var{E[X|Z]}

This theorem is used as follows. Suppose we are considering a candidate
Л†
estimator Оё, an unbiased estimator of Оё. We also have an arbitrary random
Л†
variable Z which is somehow related to Оё. Suppose that we have chosen Z
carefully so that we are able to calculate the conditional expectation T1 =
Л†
E[Оё|Z]. Then by part (a) of the above Theorem, T1 is also an unbiased estimator
of Оё. Deп¬Ѓne
Л†
Оµ = Оё в€’ T1 .

By part (c),
Л†
var(Оё) = var(T1 ) + var(Оµ)

Л† Л†
and var(T1 ) = var(Оё) в€’ var(Оµ) < var(Оё). In other words, for any variable Z,
Л† Л†
E[Оё|Z] has the same expectation as does Оё but smaller variance and the decrease
Л†
in variance is largest if Z and Оё are nearly independent, because in this case
Л†
E[Оё|Z] is close to a constant and its variance close to zero. In general the
search for an appropriate Z so as to reducing the variance of an estimator by
conditioning requires searching for a random variable Z such that:

Л†
1. the conditional expectation E[Оё|Z] with the original estimator is com-
putable

Л† Л†
2. var(E[Оё|Z]) is substantially smaller than var(Оё).
250 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Figure 4.6: Example of the Hit and Miss Method

Example 42 (hit or miss)

Suppose we wish to estimate the area under a certain graph f (x) by the hit
and miss method. A crude method would involve determining a multiple c of a
probability density function g(x) which dominates f (x) so that cg(x) в‰Ґ f (x) for
all x. We can generate points (X, Y ) at random and uniformly distributed under
the graph of cg(x) by generating X by inverse transform X = Gв€’1 (U1 ) where
G(x) is the cumulative distribution function corresponding to density g and
then generating Y from the Uniform[0, cg(X)] distribution, say Y = cg(X)U2 .
An example, with g(x) = 2x, 0 < x < 1 and c = 1/4 is given in Figure 4.6.
The hit and miss estimator of the area under the graph of f obtains by
generating such random points (X, Y ) and counting the proportion that fall
under the graph of g, i.e. for which Y В· f (X). This proportion estimates the
probability

area under f (x)
P [Y В· f (X)] =
area under cg(x)
area under f (x)
=
c
VARIANCE REDUCTION FOR ONE-DIMENSIONAL MONTE-CARLO INTEGRATION.251

since g(x) is a probability density function. Notice that if we deп¬Ѓne
вЋ§
вЋЁ c if Y В· f (X)
W=
вЋ© 0 if Y > f (X)

then

area under f (x)
E(W ) = c Г—
area under cg(x)
= area under f (x)

so W is an unbiased estimator of the parameter that we wish to estimate. We
might therefore estimate the area under f (x) using a Monte Carlo estimator
Pn
Л† 1
ОёHM = n i=1 Wi based on independent values of Wi .This is the вЂњhit-or-missвЂќ
estimator. However, in this case it is easy to п¬Ѓnd a random variable Z such
that the conditional expectation E(Z|W ) can be determined in closed form. In
fact we can choose Z = X, we obtain

f (X)
E[W |X] = .
g(X)

This is therefore an unbiased estimator of the same parameter and it has smaller
variance than does W. For a sample of size n we should replace the crude esti-
Л†
mator Оёcr by the estimator
n
1 X f (Xi )
Л†
ОёCond =
n i=1 g(Xi )
n
1 X f (Xi )
=
n i=1 2Xi
в€љ
with Xi generated from X = Gв€’1 (Ui ) = Ui , i = 1, 2, ..., n and Ui в€ј Uni-
form[0,1]. In this case, the conditional expectation results in a familiar form for
Л†
the estimator ОёCond . This is simply an importance sampling estimator with g(x)
the importance distribution. However, this derivation shows that the estimator
Л† Л†
ОёCond has smaller variance than ОёHM .
252 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

Problems

1. Use both crude and antithetic random numbers to integrate the function
Z 1
eu в€’ 1
du.
eв€’1
0

(a) What is the eп¬ѓciency gain attributed to the use of antithetic random
numbers?

(b) How large a sample size would I need, using antithetic and crude
Monte Carlo, in order to estimate the above integral, correct to four
decimal places, with probability at least 95%?

2. Under what conditions on f does the use of antithetic random numbers
completely correct for the variability of the Monte-Carlo estimator? i.e.
When is var(f (U ) + f (1 в€’ U )) = 0?

3. Suppose that F (x) is the normal(Вµ, Пѓ 2 ) cumulative distribution function,
Prove that F в€’1 (1 в€’ U ) = 2Вµ в€’ F в€’1 (U ) and therefore, if we use antithetic
random numbers to generate two normal random variables X1 , X2 , having
and variance Пѓ 2 , this is equivalent to setting X2 = 2Вµ в€’ X1 .
mean Вµ
In other words, if we wish to use antithetic random numbers for normal
variates, it is not necessary to generate the normal random variables using
the inverse transform method.

4. Show that the variance of a weighted average

var(О±X + (1 в€’ О±)W )

is minimized over О± when

var(W ) в€’ cov(X, W )
О±=
var(W ) + var(X) в€’ 2cov(X, W )

Determine the resulting minimum variance. What if the random variables
X, W are independent?
PROBLEMS 253

5. Use a stratiп¬Ѓed random sample to integrate the function
Z1 u
e в€’1
du.
0 eв€’1

What do you recommend for intervals (two or three) and sample sizes?
What is the eп¬ѓciency gain?

6. Use a combination of stratiп¬Ѓed random sampling and an antithetic random
number in the form
1
[f (U/2) + f (1 в€’ U/2)]
2
to integrate the function
Z 1
eu в€’ 1
du.
eв€’1
0
What is the eп¬ѓciency gain?

ex в€’1
7. In the case f (x) = eв€’1 , use g(x) = x as a control variate to integrate over
[0,1]. Show that the variance is reduced by a factor of approximately 60.
Is there much additional improvement if we use a more general quadratic
function of x?

8. The second version of the control variate Monte-Carlo estimator
n
1X
b {f (Ui ) в€’ ОІ[g(Ui ) в€’ E(g(Ui ))]}
Оёcv =
n i=1
is an improved control variate estimator, is equivalent to the п¬Ѓrst version
ex в€’1
in the case ОІ = 1. In the case f (x) = eв€’1 , consider using g(x) = x as a
b
control variate to integrate over [0,1]. Determine how much better Оёcv is
than the basic control variate (ОІ = 1) by performing simulations. Show
that the variance is reduced by a factor of approximately 60 over crude
Monte Carlo. Is there much additional improvement if we use a more
general quadratic function of x for g(x).

9. It has been suggested that stocks are not log-normally distributed but the
distribution can be well approximated by replacing the normal distribu-
tion by a student t distribution. Suppose that the daily returns Xi are
254 CHAPTER 4. VARIANCE REDUCTION TECHNIQUES

independent with probability density function f (x) = c(1 + (x/b)2 )в€’2 (the
re-scaled student distribution with 3 degrees of freedom). We wish to es-
P
timate a weekly Value at Risk, V ar.95 , a value ev such that P [ 5 Xi <
i=1

v] = 0.95. If we wish to do this by simulation, suggest an appropriate
method involving importance sampling. Implement and estimate the vari-
ance reduction.

10. Suppose three diп¬Ђerent simulation estimators Y1 , Y2 , Y3 have means which
depend on two unknown parameters Оё1 , Оё2 so that Y1 , Y2 , Y3 , are unbi-
ased estimators of Оё1 , Оё1 + Оё2 , Оё2 respectively. Assume that var(Yi ) =
1, cov(Yi , Yj ) = в€’1/2 an we want to estimate the parameter Оё1 . Should
we use only the estimator Y1 which is the unbiased estimator of Оё1 , or
some linear combination of Y1 , Y2 , Y3 ? Compare the number of simula-
tions necessary for a certain degree of accuracy.

ex в€’1
11. In the case f (x) = eв€’1 , use g(x) = x as a control variate to integrate
over [0,1]. Find the optimal linear combination using estimators (4.35) and
(4.36), an importance sampling estimator and the control variate estimator
above. What is the eп¬ѓciency gain over crude Monte-Carlo?

12. Show that the Jacobian of the transformation used in the proof of Theorem
в€љ1
23; (x, m) в†’ (x, y) where y = exp(в€’(2mв€’x)2 /2) is given by .
2y в€’2 ln(y)
Chapter 5

Simulating the Value of
Options

Asian Options

An Asian option, at expiration T, has value determined not by the closing price
of the underlying asset as for a European option, but on an average price of
the asset over an interval. For example a discretely sampled Asian call op-
tion on an asset with price process S(t) pays an amount on maturity equal
Pk
ВЇ ВЇ 1
to max(0, Sk в€’ K) where Sk = k i=1 S(iT /k) is the average asset price at k
equally spaced time points in the time interval (0, T ). Here, k depends on the
frequency of sampling (e.g. if T = .25 (years) and sampling is weekly, then
ВЇ
k = 13. If S(t) follows a geometric Brownian motion, then Sk is the sum of
lognormally distributed random variables and the distribution of the sum or av-
erage of lognormal random variables is very diп¬ѓcult to express analytically. For
this reason we will resort to pricing the Asian option using simulation. Notice,
however that in contrast to the arithmetic average, the distribution of the geo-
metric average has a distribution which can easily be obtained. The geometric

255
256 CHAPTER 5. SIMULATING THE VALUE OF OPTIONS

Pn
1
mean of n values X1 , ..., Xn is (X1 X2 ...Xn )1/n = exp{ n ln(Xi )} and if the
i=1

random variables Xn were each lognormally distributed then this results adding
the normally distributed random variables ln(Xi ) in the exponent, a much more
P
familiar operation. In fact the sum in the exponent n n ln(Xi ) is normally
1
i=1

distributed so the geometric average will have a lognormal distribution.

Our objective is to determine the value of the Asian option E(V1 ) with

ВЇ
V1 = eв€’rT max(0, Sk в€’ K)

Since we expect geometric means to be close to arithmetic means, a reasonable
Лњ Лњ
control variate is the random variable V2 = eв€’rT max(0, Sk в€’ K) where Sk =
Qk
{ i=1 S(iT /k)}1/k is the geometric mean. Assume that V1 and V2 obtain from
the same simulation and are therefore possibly correlated. Of course V2 is only
useful as a control variate if its expected value can be determined analytically
or numerically more easily than that of V1 but in view of the fact that V2
has a known lognormal distribution, the prospects of this are excellent. Since
S(t) = S0 eY (t) where Y (t) is a Brownian motion with Y (0) = 0, drift r в€’ Пѓ 2 /2
Лњ
and diп¬Ђusion Пѓ, it follows that Sk has the same distribution as does

k
1X
(5.1)
S0 exp{ Y (iT /k)}.
k i=1

The exponent is a weighted average of the independent normal increments of
the process and therefore normally distributed. In particular if we set

k
1X
ВЇ
Y= Y (iT /k)
k i=1
1
[k(Y (T /k)) + (k в€’ 1){Y (2T /k) в€’ Y (T /k)} + (k в€’ 2){Y (3T /k) в€’ Y (2T /k)}
=
k
+ ... + {Y (T ) в€’ Y ((k в€’ 1)T /k)}],
ASIAN OPTIONS 257

ВЇ
then we can п¬Ѓnd the mean and variance of Y ,
k
X
2
ВЇ ) = r в€’ Пѓ /2
E(Y iT /k
k i=1
Пѓ2 k + 1
= (r в€’ ) T
2 2k
e
= ВµT, say,
 << стр. 8(всего 11)СОДЕРЖАНИЕ >>