стр. 1(всего 2)СОДЕРЖАНИЕ >>
Sensitivity Analysis,
Estimating Derivatives and
the Greeks

Estimating the sensitivity of a simulation with respect to changes in the para-
meter values is an important part of establishing the validity of the conclusions.
If a simulation estimates an expected value at certain value of the parameters
with 0.32 В± 0.05 but the derivative with respect to one parameter, say the
volatility parameter Пѓ, is 5, this indicates that a change of the volatility of only
0.02 or 2 percent would result in a change in the average of the order of 0.1.
Since volatility typically changes rapidly by far more than one percent, then the
apparent precision of the estimate 0.32 В± .005 is very misleading.
Of particular importance in п¬Ѓnance are certain derivatives of an option price
or portfolio value with respect to the parameters underlying the Black Scholes
model. These are called the вЂњGreeksвЂќ, because many of them (not to mention
many parameters and constants used in Statistics, Physics, Mathematics, and
the rest of Science) are denoted by greek letters. Suppose V = V (S(t), t, Пѓ, r)
is the value of a portfolio or a derivative based on an asset S(t) where the
volatility parameter is Пѓ and r is the current spot interest rate. For example for
a single European call option, from the Black-Scholes formula

V (S, t, Пѓ, r) = SО¦(d1 ) в€’ Keв€’r(T в€’t) О¦(d2 ) (7.61)

where

ln(S/K) + (r + Пѓ2 /2)(T в€’ t)
в€љ
d1 = ,
Пѓ T в€’t
в€љ ln(S/K) + (r в€’ Пѓ 2 /2)(T в€’ t)
в€љ
d2 = d1 в€’ Пѓ T в€’ t =
Пѓ T в€’t

In this case there are analytic formulae for these quantities given in Table
8.1:

395
396SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Value in
Name Symbol Deп¬Ѓnition
BS model
в€‚V
delta в€† О¦(d1 )
в€‚S
в€‚2V П†(d1 )
gamma О“ в€љ
sПѓ T в€’t \
в€‚S 2
в€‚V
K(T в€’ t)eв€’r(T в€’t) О¦(d2 )
rho ПЃ в€‚r
sПѓП†(d1 )
в€‚V
в€’ rKeв€’r(T в€’t) О¦(d2 )
theta О˜ в€љ
в€‚t в€љ
2 T в€’t
в€‚V
V sП†(d1 ) T в€’ t
vega в€‚Пѓ
TABLE 8:1 The вЂњGreeksвЂќ for a European Call in the Black-Scholes model

One example of the above calculations is given below. The remaining are
left as exercises.
в€‚V в€‚
{SО¦(d1 ) в€’ Keв€’r(T в€’t) О¦(d2 )}
=
в€‚S в€‚S
в€‚d1 в€‚d2
в€’ Keв€’r(T в€’t) П†(d2 )
= О¦(d1 ) + SП†(d1 )
в€‚S в€‚S
1
= О¦(d1 ) + (SП†(d1 ) в€’ Keв€’r(T в€’t) П†(d2 )) в€љ
SПѓ T в€’ t
= О¦(d1 ) since SП†(d1 ) = Keв€’r(T в€’t) П†(d2 ).

where П†, О¦ are the standard normal probability density function and cumulative
distribution function respectively. These derivatives are calculated typically not
only because they are relevant to a hedging strategy (especially в€† and О“) but
also because they give an idea as to how rapidly the value of our portfolio is
eп¬Ђected when there is a (possibly adverse) change in one of the parameters.
As an example of the use of these derivatives, it is common to hedge a port-
folio against changes in one or more parameters. A simple hedge was introduced
when we developed the Black-Scholes formula originally. Suppose we wish to
hedge an investment in a call option whose value is given by V = V (S(t), t, Пѓ, r)
at time t with xS units of the underlying stock. Then the value function for
this new portfolio is
V + xS S
and setting the delta of this portfolio equal to zero results in the equation

в€†O + xS = 0

or
xS = в€’в€†O .
This means that we would need to short в€†O units of the stock in order to hedge
the investment in the stock. The delta of an option is therefore the amount of
the stock that we would sell short in order to hedge one unit of the option.
In a sense, owning в€†O units of the stock is locally equivalent to owning one
unit of the stock, at least from the perspective of small changes in the stock
price. In Figure 7.15 below we plot the delta for a call option with strike price
100, interest rate r = 0.05 and T в€’ t = 0.25 years. The second derivative,
397

Figure 7.15: Call option Delta in the Black-Scholes model, Exercise price=100,
r = 0.05, T в€’ t = 0.25 years.

Figure 7.16: Call option Gamma in the Black-Scholes model, T в€’ t = 0.25 years,
Пѓ = 0.3, r = 0.05, K = 100.
398SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

П†(d1 )
the gamma of the call option, is which behaves roughly like a constant
в€љ
SПѓ T в€’t
times the normal density with mode at S = eв€’r(T в€’t) K (see Figure 7.16).
With only one instrument (in this case the stock) available to hedge our
investment in the call option, there is a single coeп¬ѓcient xS that we are free to
choose and therefore only one constraint (portfolio delta equals zero) that we
can arrange to satisfy. However if there were other instruments such as diп¬Ђerent
options on the same underlying available, we could construct a hedge setting
other greeks for the hedged portfolio (e.g. gamma, Rho) equal to zero as well.
For example suppose I own a portfolio whose value P = P (S(t), t, Пѓ, r) de-
pends on the price of a stock or index S. I wish to immunize this portfolio
against changes in S by investing directly in the stock S and in an option on
this stock whose value is given by V = V (S(t), t, Пѓ, r) at time t. We will need to
assume that the value functions V, S, P are distinct in the sense that one is not
a linear combination of the other two. Suppose I add to my portfolio xS units
of the stock and xO units of the option so that the value of the new portfolio
has value
P + xS S + xO V.
In order to ensure that this value changes as little as possible when S changes,
set the value of its delta and gamma (п¬Ѓrst and second derivative with respect
to S) equal to zero. This gives two equations in the two unknown values of
xS , xO .
в€†P + xS + xO в€†O = 0
О“P + xO О“O = 0
where в€†P , в€†0 are the deltas for the original portfolio and option respectively
and О“P , О“O are the gammas. The solution gives
О“P
xo = в€’
О“O
О“P
в€’ в€†P
xs = в€†O
О“O
and the hedged portfolio has value
О“P О“P
в€’ в€†P )S в€’ (
P + (в€†o )V.
О“O О“O
As an example, suppose our portfolio consists of a digital option which pays \$1
at time T = 1 provided that the stock price exceeds some threshold value, KD .
Then the value function for this option is
P = eв€’r Q[S(1) > KD ]
= eв€’r О¦(d2 )
with
Пѓ2
ln(S/KD ) + r в€’ 2
d2 = .
Пѓ
399

Therefore
в€‚ в€’r в€‚d2
e О¦(d2 ) = eв€’r П•(d2 )
в€†P =
в€‚S в€‚S
1 1 П•(d1 )
= eв€’r П•(d2 ) = SП•(d1 ) =
SПѓ KSПѓ KПѓ
and
в€‚ в€’r П•(d2 ) П•(d1 )
e О¦(d2 ) = eв€’r 2 2 (d2 + Пѓ) =
О“P = d1
2 KSПѓ 2
в€‚S SПѓ
where we again use the fact that

Keв€’r П•(d2 ) = SП•(d1 ).

Finally, if we deп¬Ѓne
2 ln(S/KO ) + 0.5(r + Пѓ 2 /2)
d0 =
1
Пѓ
and d0 for the values corresponding to the call option, the coeп¬ѓcients of the
2
hedged portfolio are therefore
О“P П•(d1 )d1 SПѓ/2 d1 П•(d1 )
xO = в€’ =в€’ =в€’
KSПѓ 2 П†(d0 ) 2KПѓП†(d0 )
О“O 1 1
О“P
в€’ в€†P = в€’(xO в€†O + в€†P ).
xS = в€†O
О“O
In Figure 7.17 we plot the value of these coeп¬ѓcients for the case of the digital
option above.
Evidently the coeп¬ѓcients xS < 0, xO > 0 when S < 100 and both options are
deep out of the money indicating that we should short the stock and purchase
the call option. When both options are deep in the money and S is large, the
reverse holds and we short the call option and buy the stock. In the simplest
delta hedge, it is often easy to at least recognize in advance the whether a given
coeп¬ѓcient is positive or negative, i.e. whether we are to purchase or sell short
a given instrument. In the case of several instruments, when we set several
derivatives equal to zero, the sign of the coeп¬ѓcients are by no means obvious in
advance, one of the reasons why mathematics is very useful in such cases.
The availability of two instruments, the stock and a single option on the
underlying S allow us to adjust a portfolio so that the п¬Ѓrst two derivatives of
its value function with respect to S are both zero. The portfolio is therefore
protected against reasonably small changes in S. Similarly, with more options on
the same stock, one could arrange that the portfolio is immunized or protected
against adverse movements in the other parameters as well, including the in-
terest rate and the volatility parameter. This hedged portfolio clearly requires
derivatives of the value function, and for more complicated models than the
Black-Scholes, we require simulation methods not only for valuing options and
portfolios, but also for determining these derivatives with respect to underlying
parameters.
400SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Figure 7.17: The number of units xs and xo required of the underlying and a
call option (K = 100, T = 0.25) to hedge a digital option with K = 100 and
T = 1. Other parameters: Пѓ = 0.3, r = 0.05.

We now return to the question of estimating the sensitivity of simulations to
changes in underlying parameters, usually in a case in which no simple formula
exists for the function or its derivatives. For example in sensitivity analysis, we
wish to estimate an expected value at values of an underlying parameter in the
neighborhood of a given value. In stress testing, we might wish to estimate an
expected value under more extreme conditions, for example if the interest rate
doubled or an exchange rate took a dramatic downturn. Surprisingly, at least in
theory, information can be obtained about the behaviour of a function at values
of the parameter quite diп¬Ђerent than those used to conduct the simulation. In
fact, importance sampling appears to permit us to conduct simulations at one
value of a parameter Оё and use these estimate expected values corresponding
to all possible values of the parameter. The estimation of an expectation under
one value of a parameter using simulations conducted at another is sometimes
called the вЂњwhat ifвЂќ problem. Denote the probability density function of a
random variable or vector X under Оё by fОё (x) and assume these densities all have
common support. An expectation calculated under this value of the parameter
will be denoted EОё (.). If we want to estimate the expected value of T (X), under
diп¬Ђerent values П€ of the parameter but our simulations are conducted at the
parameter value Оё, note that
(7.62)
m(П€) = EП€ T (X) = EОё [T (X)fП€ (X)/fОё (X)].
There may be many reasons for our interest in the function m(П€). A derivative
is priced using current values for the asset price, interest rate, volatility para-
meter etc. and we may wish to graph the price over a range of (possible future)
values of the parameters. The necessity for estimating derivatives in order to
immunize or hedge a portfolio is discussed above. The likelihood ratio estimator
401

or importance sampling estimator T (X)fП€ (X)/fОё (X)where X в€ј fОё is an unbi-
ased estimator of m(П€) so a simulation at Оё permits unbiased estimation of the
whole function m(П€) = EП€ T (X), and thereby also its derivatives.
However, this simple result should be moderated by a study of the precision
of this estimator. Suppose Оё is the true value of the parameter and the val-
ues X1 , ..., Xn are simulated as independent random variables from the joint
Q
probability density function n fОё (Xi ). Consider the likelihood ratio
i=1
Qn
f (X )
Qi=1 П€ i
n
i=1 fОё (Xi )

for П€ 6= Оё. Notice that upon taking the expectation under Оё of the logarithm,
we obtain
n
X
EОё { ln(fП€ (Xi )) в€’ ln(fОё (Xi ))} = в€’nH(fОё , fП€ )
i=1

where H(fОё , fП€ ) > 0 is the cross-entropy. Therefore as n в†’ в€ћ,
n
X
ln(fП€ (Xi )) в€’ ln(fОё (Xi )) в†’ в€’в€ћ
i=1

with probability one, and upon exponentiating,
Qn
f (X )
Qi=1 П€ i в†’ 0.
n
i=1 fОё (Xi )

This means that the likelihood is very much smaller for values of the para-
meter П€ far from the true value than it is at the true value Оё. You might think
that since for large n, since the function
Qn
f (X )
Qi=1 П€ i (7.63)
n
i=1 fОё (Xi )

is very close to zero for values of X which have probability close to one, then
the Qn
Вµ В¶
fП€ (Xi )
EОё T (X1 , ..., Xn ) Qi=1 (7.64)
n
i=1 fОё (Xi )

should be close to zero for any function T (X1 , ..., Xn ). Somewhat paradoxically,
however, if we substitute T (X1 , ..., Xn ) = 1, then it is easy to see that (7.64) is
identical to one for all П€ and all n. Strangely, for large n, the random variable
(7.63) is very close to 0 for almost all X and converges to 0 as n в†’ в€ћ, and yet its
expected value remains 1 for all n. This apparent contradiction stems from the
fact that the likelihood ratios are not uniformly integrable. They do not behave
in the limit as n в†’ в€ћ in the same way in probability as they do in expectation.
For large n, (7.63) is close to 0 with high probability but as n increases, it takes
larger and larger values with decreasingly small probabilities. In other words,
for large sample sizes, the likelihood ratio (7.63) is rather unstable, taking very
402SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

large values over values of X which have small probability and is close to zero
for the rest of the possible values of X. Consequently, (7.63) has very large
variance. In fact as the sample size в†’ в€ћ, the variance of (7.63) approaches
inп¬Ѓnity very quickly. In such a situation, sample means or averages may fail to
the expected value or may approach it extremely slowly. This argues against
the using an average which involves the likelihood ratio if there is a substantial
diп¬Ђerence between the parameter values П€ and Оё and especially if the sample
size is large. Moment-type estimators based on these likelihood ratios will tend
to be very unstable in mean and variance, particularly when П€ is far from Оё.
This problem may be partially alleviated if variance reduction or alternative
techniques are employed.

7.8.1 Example
As an example, suppose we wish to estimate

EП€ (X n )

where
1 1
Xn = Sn = (X1 + ... + Xn )
n n
is the sample mean of independent observations X1 , ..., Xn from an exponential
distribution with parameter П€. Suppose we conduct our simulations at parame-
ter value Оё instead, so with
1
fП€ (x) = exp(в€’x/П€), x > 0,
П€
we use the estimator
Qn
Оёn
f (X ) 1 1
Qi=1 П€ i = Sn exp{Sn ( в€’ )} (7.65)
T (X1 , ..., Xn ) = X n n
nП€ n ОёП€
i=1 fОё (Xi )

with Xi independent Exp(Оё) random variables. Then it is not hard to show that
T (X1 , ..., Xn ) has the correct expected value, П€, provided that it has a well-
deп¬Ѓned expected value at all. The variance, on the other hand, is quite unstable
when there is much separation between Оё and П€. The standard deviation of the
estimator T (1 , ..., Xn ) is plotted in Figure 7.18 in the case n = 10, Оё = 1 and
it is clearly small for П€ fairly close (actually a little smaller) than Оё but large as
|Оё в€’ П€| increases.

7.9 Estimating Derivatives
в€‚
Let us begin by examining the estimation of the derivative m0 (Оё) = в€‚Оё EОё T (X)
in general when we are only able to evaluate the function T (X) by simulation,
so there is error in its valuation. We could conduct independent simulations at
two diп¬Ђerent values of the parameters, say at Оё + h, Оё в€’ h, average the values of
7.9. ESTIMATING DERIVATIVES 403

Figure 7.18: Standard Deviation of Estimator T (X1 , ..., Xn ) for Оё = 1, various
values of П€.

T (X) under each, resulting say in the estimators m(Оё + h) and m(Оё в€’ h) and
Л† Л†
then take as our estimator the diп¬Ђerence
m(Оё + h) в€’ m(Оё в€’ h)
Л† Л†
(7.66)
2h
but this crude estimator suп¬Ђers from a number of disadvantages;
вЂў It requires twice as many simulations as we conduct at a single point.
Since we normally wish to simulate at Оё as well, this is three times the
number of simulations as required at a single point.
вЂў It is heavily biased if h is large unless the function m(Оё) is close to linear.
вЂў It has very large variance if h is small.
To illustrate the second point, note that provided that the function m is
three times diп¬Ђerentiable, using a Taylor series expansion,
m(Оё + h) в€’ m(Оё в€’ h) m(Оё + h) в€’ m(Оё в€’ h)
Л† Л†
}=
E{
2h 2h
1 1 1
{m(Оё) + m0 (Оё)h + m00 (Оё)h2 в€’ (m(Оё) в€’ m0 (Оё)h + m00 (Оё)h2 ) + O(h3 )}
=
2h 2 2
0 2
= m (Оё) + O(h )
as h в†’ 0. However the error implicit in the term O(h2 ) can be large when h
is large and if h is small then the third point is problematic. Since
Вµ В¶
m(Оё + h) в€’ m(Оё в€’ h)
Л† Л† 1
= 2 {var[m(Оё + h)] + var[m(Оё в€’ h)]} в†’ в€ћ
var Л† Л†
2h 4h
404SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

quite quickly as h в†’ 0, we cannot aп¬Ђord to permit h too small if we use
independent simulations.
Now we have seen some methods for ameliorating the last of these problems.
Since the derivative requires that we estimate a diп¬Ђerence, use of common ran-
dom numbers in the simulations at the two parameter values Оё + h and Оё в€’ h
should reduce the variability, but we are still faced with the problem of taking
the limit of such an estimator as h в†’ 0 to estimate the derivative.

7.9.1 The Score Function Estimator.
There are two alternatives that are popularly used, Perturbation Analysis, which
depends on pathwise diп¬Ђerentiation, and the score function or Likelihood ratio
method. Both have the feature that a simulation at a single parameter value
allows both estimation of the function and its derivative.
We begin by introducing the score function method. The idea behind the
score function method is very simple, and it involves interchanging derivative
R
в€‚
and integral. We wish to estimate m0 (Оё) = в€‚Оё T (x)fОё (x)dx where fОё is the
joint probability density function of all of our simulations. Then under regular-
ity conditions called the CramГ©r conditions (see for example those required in
the Cramer-Rao inequality, Appendix B), we may interchange the integral and
derivative
Z
в€‚
0
(7.67)
m (Оё) = T (x)fОё (x)dx
в€‚Оё
Z
в€‚fОё (x)
= T (x) dx = EОё [T (X)S(Оё)]
в€‚Оё

where S(Оё) denotes the score function or

в€‚ln[fОё (x)]
(7.68)
S(Оё) = S(Оё, x) = .
в€‚Оё
Since the score function has expected value 0, i.e. EОё {S(Оё)} = 0, the quan-
tity EОё [T (X)S(Оё)] is just the covariance between T (X) and S(Оё) and this
can be estimated using the sample covariance. In particular if T is a function
of n independent simulations X = (X1 , ..., Xn ) each of which has probability
density function fОё (x), then the score function for the vector of observations is
P
Sn (Оё, X) = n S1 (Оё, Xi ). with
i=1

в€‚ln[fОё (x)]
S1 (Оё, x) =
в€‚Оё
the score function for a single simulated value. We wish to estimate the covari-
ance
n n
X X
(7.69)
cov(T (X1 , ..., Xn ), S1 (Оё, Xi )) = cov(T (X1 , ..., Xn ), S1 (Оё, Xi )).
i=1 i=1
7.9. ESTIMATING DERIVATIVES 405

One possibility motivated by (7.69) is to use use the sample covariance between
Pn
independent replications of T (X1 , ..., Xn ) and i=1 S1 (Оё, Xi ) over many simu-
lations. More speciп¬Ѓcally if we denote m independent replications of T by
Tj = T (Xj1 , Xj2 , ..., Xjn ), j = 1, 2, ..., m where {Xji , i = 1, ..., n, j = 1, ..., m}
are independent observations from the probability density function fОё (x). This
в€‚
provides an estimator of the sensitivity в€‚Оё E Оё T (X). If we denote the score func-
tion for the vector of observations (Xj1 , Xj2 , ..., Xjn ) by
n
X
Sj = S1 (Оё, Xji )
i=1

then the simplest version of the score function estimator is the sample covariance
m
1X
Sj (Tj в€’ T )
c (7.70)
SF1 = cov(Sj , Tj ) =
m в€’ 1 j=1
Pm
1
where T = m j=1 Tj .
However there is a generally more eп¬ѓcient estimator motivated by the rela-
tionship between regression coeп¬ѓcients and covariance. If we were to construct
a regression of the variables Tj on Sj of the form
Tj = О± + ОІT |S Sj + Оµj
then the coeп¬ѓcient which minimizes the variance of the errors Оµj is
cov(Tj , Sj ) cov(Tj , Sj )
ОІT |S = = .
var(Sj ) nJ1 (Оё)
where J1 (Оё) = var[S1 (Оё, Xi )]. It follows that we can construct an estimator of
the covariance from the estimated regression coeп¬ѓcients
Pm
j=1 Sj (Tj в€’ T )
b
ОІT |S = Pm 2
j=1 (Sj в€’ S)

from which the covariance (7.69) can be estimated using
b (7.71)
SF2 = ОІT |S var(Sj ).
Both SF1 and SF2 are estimators of m0 (Оё) but S2 is generally more eп¬ѓcient.
In fact using the variance of the regression slope estimator, we can obtain an
approximate variance for this estimator:
1
b
var(SF2 ) = [var(Sj )]2 var(ОІT |S ) ' var(Tj )var(Sj )[1 в€’ ПЃ2 ] (7.72)
ST
m
with
cov(Tj , Sj )
ПЃST = p .
var(Tj )var(Sj )
n
Notice that (7.72) can be expressed as m var(Tj )J1 (Оё)[1 в€’ ПЃ2 ] which makes
ST
it apparent that it may be a noisy estimator for n large unless either m is
correspondingly larger or ПЃ2 is close to one.
ST
406SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Example. A Monte-Carlo Estimator of Rho.

Suppose are interested in estimating Rho for an option with payoп¬Ђ function at
maturity given by V (S(T ), T ), S(T ) the value of the stock at time T. Assume
the Black-Scholes model so that the distribution of S(T ) under the Q measure
в€љ
is lognormal with mean О· = S0 exp{rT } and volatility Пѓ T . For brevity we
denote S(T ) by S. Then since S is log-normal, S = eY where Y в€ј N (log(О·) в€’
Пѓ 2 T /2, Пѓ 2 T ). Note that if g is the corresponding lognormal probability density
function,

Y в€’ log(О·) + Пѓ 2 T /2
в€‚log(g)
=
О·Пѓ2 T
в€‚О·
Y в€’ log(О·) + Пѓ 2 T /2 в€‚О·
в€‚log(g)
(7.73)
=
О·Пѓ2 T
в€‚r в€‚r
ln(S/S0 ) в€’ rT + Пѓ2 T /2
=
Пѓ2

Thus an estimator of ПЃ can be obtained from the sample covariance, over a
large number of simulations, between V (S(T ), T ) and
в€‚log(g)
or equivalently
в€‚r

Пѓ в€’2 cov(V (S(T ), T ), ln(S(T )/S0 )).
c

The score function estimator can be expressed as a limit (as h в†’ 0) of
likelihood ratio estimators. However, the score function is more stable than
is the likelihood ratio for large sample size because its moment behaviour is,
unlike that of the likelihood ratio, similar to its behaviour in probability. Under
Pn standard regularity conditions referred to above, the score function Sn (Оё) =
the
i=1 S1 (Оё, Xi )) for an independent sample of size n satisп¬Ѓes a law of large
numbers
1
Sn (Оё) в†’ E[S1 (Оё, Xi )] = 0 (7.74)
n
and a central limit theorem;

1
в€љ Sn (Оё) в†’ N (0, J1 (Оё)) (7.75)
n

in distribution where the limiting variance J1 (Оё) = var[S1 (Оё, Xi )]. When n is
large, however, the score function estimator still suп¬Ђers from too much variabil-
ity.
The score function estimator does have a kind of optimality attached to
it, provided that we restrict to estimators which can also be expressed as a
в€‚
covariance. Among all random functions G(X; Оё) which satisfy в€‚Оё EОё T (X) =
EОё [(T (X)G(X; Оё)] for all V , the score function cannot be improved on in the
sense that it has the smallest possible variance.
7.9. ESTIMATING DERIVATIVES 407

Conditioning the Score Function Estimator.
One method for reducing the variability of the score function estimator is by
using conditioning. This is particularly easy for the standard distributions in
the exponential family. Note that

m0(Оё) = EОё [T (X)Sn (Оё, X)] = EОё {EОё [T (X)|Sn (Оё, X)]Sn (Оё, X)} (7.76)

The conditional expectation EОё [T (X)|Sn (Оё, X)] in the above product can be
estimated by Monte-Carlo provided that we are able to generate the variates Xi
conditional on the value of the score function. The outside integral EОё {.}over
the distribution of Sn (Оё, X) may be conducted either analytically or numerically,
using our knowledge of the п¬Ѓnite sample or asymptotic distribution of the score
function.
For brevity, denote Sn (Оё, X) by S and its marginal probability density func-
tion by fS (s). Let Xs = (Xs1 , ..., Xsn ), where Xsi , i = 1, ...n are random
variables all generated with the conditional distribution of X = (X1 , ..., Xn )
given S = s for the п¬Ѓxed parameter Оё. Then based on a sample of size n, the
conditioned score function estimator is:
Z n
1X
(7.77)
( T (Xs ))sfS (s)ds.
n i=1

There are some powerful advantages to (7.77), particularly when the data is
generated from one of the distributions in an exponential family. The expo-
nential family of distributions is a broad class which includes most well-known
continuous and discrete distribution families such as the normal, lognormal,
exponential, gamma, binomial, negative binomial, geometric, and Poisson dis-
tributions.
X1 is said to have an exponential family distribution if its probability density
function with respect to some dominating measure (usually a counting measure
or Lebesgue measure) takes the form:

fОё (x) = eО·(Оё)Y (x) h(x)c(Оё)
for some functions О·(Оё), c(Оё), Y (x)and h(x). For exponential family distrib-
utions, conditioning on the score is equivalent to conditioning on the sum
Pn 0
i=1 Y (Xi ) provided О· (Оё) 6= 0 and the score is a simple function of
Z=
Z, viz.
Sn (Оё, X) = О· 0 (Оё){Z в€’ nEОё (Y (Xi ))} (7.78)
So revising (7.77) to condition on the sum Z, the estimator becomes
Z
\ 0
m0(Оё) = О· (Оё) E[T (Xs )|Z = z]{Z в€’ nEОё (Y (Xi ))}fZ (z)dz

= О· 0 (Оё)cov(E[T (Xs )|Z], Z) (7.79)

where fZ (z) is the distribution Z.
408SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

When we are attempting to estimate derivatives m0 (Оё) simultaneously at
a number of diп¬Ђerent values of Оё, perhaps in order to п¬Ѓt the function with
splines or represent it graphically, there are some very considerable advantages
to the estimator underlying (7.79). Because the conditional expectation does
not depend on the value of Оё, we may conduct the simulation (usually at two or
more values of t) at a single convenient Оё. The estimated conditional expectation
will be then used in an integral of the form (7.79) for all underlying values of Оё.
Similarly, a single simulation can be used to estimate m(П€) for many diп¬Ђerent
values of П€.
There are a number of simple special cases of exponential family where the
conditional distributions are easily established and simulated. In each case below
we specify the probability density function fZ (z), the conditional distribution
of the vector Xs given Z = z and the function О· 0 (Оё).

7.9.2 Examples.
1. (Exponential Distribution). Suppose Xi are exponentially P distributed
. Then given n Xi =
1 в€’x/Оё
with probability density function fОё (x) = Оё e i=1
Pnв€’1
z the values X1 , X1 + X2 , ... i=1 Xi are generated as n в€’ 1 Uniform[0, z]
order statistics an from these we obtain the values of Xi as the spacings
between order statistics. In this case О· 0 (Оё) = Оёв€’2 and fZ , the probability
density function of Z, is Gamma(n, Оё).

2. (Gamma distribution). Suppose Xi are distributed as independent
gamma (О±, Оё) variates with probability density function

xО±в€’1 eв€’x/Оё
(7.80)
fОё (x) = .
О“(О±)ОёО±
P
Then conditionally on Z = n Xi , we can generate Xs as follows: gen-
i=1
erate Xs1 as ZГ— an independent Beta (О±, nО±) random variable, Xs2 = ZГ—
P
Beta (О±, (n в€’ 1)О±) ... Xsn = Z в€’ nв€’1 Xsi .In this case О· 0 (Оё) = Оёв€’2 and
i=1
fZ , the probability density function of Z, is Gamma(nО±, Оё).

3. (Normal distribution). Suppose Xi has a N (Оё, Пѓ 2 )distribution. ThenP
Xs is generated as follows: the distribution of Xs1 given Z = i Xi is
N (Z/n, (1в€’ n )Пѓ 2 ). The distribution of Xs2 given Xs1 and Z is N ( Zв€’Xs1 , (1в€’
1
nв€’1
Pnв€’1
1 2 0 в€’2
nв€’1 )Пѓ )... etc., Xsn = Z в€’ i=1 Xsi . In this case О· (Оё) = Пѓ and fZ (z)
2
is the Normal(nОё, Пѓ ) probability density function.

4. (Binomial distribution). Suppose Xi are distributed as binomial
Pn
(m, Оё) variates. Then given Z = i=1 Xi , Xs1 has a hypergeometric
distribution with parameters (mn, m, z); given Z and Xs1 , Xs2 has a hy-
pergeometric distribution with parameters (m(n в€’ 1), m, Z в€’ Xs1 ), etc.,
Pnв€’1
Xsn = Z в€’ i=1 Xsi . In this case Z has the Binomial(mn, Оё) distribution
and О· 0 (Оё) = [Оё(1 в€’ Оё)]в€’1 .
7.9. ESTIMATING DERIVATIVES 409

5. (Poisson distribution). Suppose Xi , i = 1, ..., n are independent Pois-
Pn
son (Оё) random variables. Then given Z = i=1 Xi , the distribution of
Xs1 is Binomial (Z, 1/n), the conditional distribution of Xs2 is Binomial(Zв€’
Pnв€’1
1
Xs1 , nв€’1 )...etc. and Xsn = z в€’ i=1 Xsi . In this case О· 0 (Оё) = Оёв€’1 and
the distribution of Z is Poisson(nОё).
6. (Geometric distribution). Suppose Xi , i =P ..., n are independent
1,
with geometric(Оё) distribution. Then given Z = n Xi , the distribution
i=1
of Xs1 is a negative hypergeometric with probability function
ВЎZв€’xв€’1Вў
nв€’2
f (x) = ВЎZв€’1Вў .
nв€’1
P
Similarly, Xs2 , ..Xsn = z в€’ nв€’1 Xsi . In this case Z has a negative
i=1
binomial(n, Оё) distribution and О· 0 (Оё) = в€’(1 в€’ Оё)в€’1 .
7. (log-Normal Distribution) Suppose Xi i = 1, ..., n are independent
with the log-normal distribution with mean Оё and probability density
function fОё (x). Then

ln(x) в€’ ln(Оё) + Пѓ2 /2
в€‚ ln(fОё (x))
(7.81)
=
ОёПѓ 2
в€‚Оё
and therefore the conditional distribution of X1 given the suп¬ѓcient sta-
Q
tistic Z = n Xi is lognormal with has conditional mean Z 1/n exp{(1 в€’
i=1
1 Пѓ2 1 2
n ) 2 } and volatility parameter (1 в€’ n )Пѓ .

7.9.3 Example. Estimating vega.
Suppose we wish to estimate the vega of an option on some underlying S(t) and
Пѓ is the volatility parameter in the asset price equation. Consider for example
a European option whose underlying has volatility parameter Пѓ. Then the vega
is the derivative of the value of the option with respect to the volatility:
в€‚
E{eв€’rT V (ST )}
в€‚Пѓ
where ST , the terminal value of the asset, is assumed to have a lognormal
distribution with mean S0 erT and variance parameter Пѓ 2 T and V (S) is the payoп¬Ђ
on maturity of the option. Denote the corresponding lognormal probability
density function of S(T ) by
1
exp{в€’(log(s) в€’ log(S0 ) в€’ rT + Пѓ 2 T /2)2 /2Пѓ 2 T }.
в€љ
g(s) =
sПѓ 2ПЂT
It is easy to translate this lognormal distribution into exponential family form,
with
Y (s) = (log(s) в€’ log(S0 ) в€’ rT )2
410SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

and
1
О·(Пѓ) = .
2Пѓ 2 T
Therefore, from (7.78) the score function is of the form

О· 0 (Пѓ)[Y (s) в€’ E(Y (s))]

and an unbiased estimator of vega is the sample covariance, taken over all sim-
ulations,
1
cov(eв€’rT V (ST ), О· 0 (Пѓ)Y (ST )) = в€’eв€’rT
c c (7.82)
cov(V (ST ), Y (ST )).
Пѓ3T
Since we can generate ST from a standard normal random variable Z;
в€љ
ST = S0 exp{rT в€’ Пѓ2 T /2 + Пѓ T Z}, so (7.83)
Y (ST ) = Пѓ 2 T Z 2 (7.84)

Then the covariance in (7.82) can be written

1
в€’eв€’rT cov(V (ST ), Z 2 )
Пѓ
with ST is generated from (7.83). This reduces to a simple one-dimensional
integral with respect to a normal probability density function and we can either
simulate this quantity or use integration. Because of the high variability of
the score function, it is desirable to use variance reduction in evaluating this
estimator or to conduct the integration analytically or numerically. One of
the simplest numerical integration techniques when expectation can be written
with respect to a normal probability density function is Gaussian Quadrature
mentioned below. This is just one example of a whole class of algorithms for
numerical integration with respect to certain weight functions.

We often approximate integrals of the form
Zв€ћ
(7.85)
h(x)П†(x)dx
в€’в€ћ

where П† is the standard normal probability density function. with a weighted
sum such as
k
X
(7.86)
wi h(xi )
i=1

for certain points xi and corresponding weights wi . Crude Monte Carlo is an
example of such an integration technique where we choose the abscissae xi at
random according to a normal distribution and use equal weights wi = 1/k,
7.9. ESTIMATING DERIVATIVES 411

i=1,2,...,k. However, it is possible to choose the weights so that the approxima-
tion is exact, i.e.
Zв€ћ
k
X
(7.87)
wi h(xi ) = h(x)П†(x)dx
в€’в€ћ
i=1

whenever h is a polynomial of certain degree. For example if we wish equality
in (7.87) for the function h(x) = xr , we would require
ВЅ
Z
k
X в€ћ
0 r odd
wi xr = xr П†(x)dx = r!
i r even.
(r/2)!2r/2
в€’в€ћ
i=1

If we impose this requirement for r = 0, 1, 2, ..., k в€’ 1, this leads to k equations
in the k unknown values of wi , the п¬Ѓrst few of which are
X X X X
2 4
wi x16 = 15
wi = 1, wi xi = 1, wi xi = 3, i
X
wi xk = 0, for k odd.
i

Normally, for given values of xi , it is possible to п¬Ѓnd a solution to these k
equations in the k unknowns wi unless the system is singular. Having found the
appropriate value of wi we are then guaranteed (7.87) holds for any polynomial
h(x) of degree k в€’ 1 or less.
We can do better than this, however, if we are free not only to choose the
weights, but also the points xi . In fact we can п¬Ѓnd values of wi and xi such
that (7.87) holds for all polynomials of degree at most 2k в€’ 1. Without verifying
this fact, we simply mention that this is the case if we choose the abscissae to
be the k roots of the Hermite polynomials ,

dk П†(x)
pk (x) = (в€’1)k [П†(x)]в€’1 (7.88)
.
dxk
For reference, the Hermite polynomials with degree k В· 7 and their roots are:

Roots xi (approximate) Weights wi
k pk (x)
0 1
1 x 0 1
x2 в€’ 1 В±1 в€љ
2 1/2
21
x3 в€’ 3x 0, В± 3
3 3, 6
x4 в€’ 6x2 + 3 В±2.3344, В±0.74196
4 0.0459, 0.4541
x5 в€’ 10x3 + 15x 0, В±2.857, В±1.3556
5 0.5333, 0.0113, 0.2221
x6 в€’ 15x4 + 45x2 в€’ 15 В±3.3243, В±1.8892, В±0.61671
6 0.0046, 0.0886, 0.4088
x7 в€’ 21x5 + 105x3 в€’ 105x 0, В±3.7504, В±2.3668, В±1.1544
7 0.4571, 0.0005, 0.0308, 0.2401

Finding the roots of these polynomials, and solving for the appropriate weights,
Rв€ћ
the corresponding approximation to the integral в€’в€ћ h(t)П†(t)dt using k = 3 is,
for example,
412SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Z в€љ
в€ћ
h(x)П†(x)dx в‰€ (2/3)h(0) + (1/6)h(В± 3), for k = 3
в€’в€ћ

Using these, if we wish to evaluate the expected value of a function of X в€ј
N (Вµ, Пѓ 2 ), the approximations are
X
E[h(X)] в‰€ wi h(Вµ + Пѓxi )
i

This formula is exact for h a polynomial of degree 2k в€’ 1 or less. Although the
roots xi are not exactly equally spaced, they are fairly close

7.10 Inп¬Ѓnitesimal Perturbation Analysis: Path-
wise diп¬Ђerentiation
There is an alternate method for measuring the sensitivity of simulations to per-
turbations in the parameters which can perform better than the score function
method. This method requires assumptions on the derivative of the performance
measure. As a preliminary example, let us return to the problem of estimating
a Greek (e.g. rho, vega, delta or theta) for a European option. We wish to
estimate the derivative of the option price

EQ {eв€’r(T в€’t) V (ST )}

with respect to some parameter (e.g. r, Пѓ, S0 , t) where ST has a lognormal
distribution with mean S0 erT and variance parameter Пѓ 2 T , and V (ST ) is the
value of the option on expiry when the stock price is ST . As an example suppose
we wish to estimate vega, the derivative of the option price with respect to the
unknown volatility parameter Пѓ underlying the distribution of ST , and assume
for simplicity of expressions that t = 0. Suppose we generate ST = S0 exp(rT в€’
в€љ
Пѓ 2 T /2 + ПѓZ T ) for a standard normal random variable Z. Notice that ST
is explicitly a function of the parameter Пѓ. Then diп¬Ђerentiating directly with
respect to this parameter, provided such a derivative exists and can be moved
under the expectation sign, yields
в€‚ в€‚
E{eв€’rT V (ST )} = E{eв€’rT V 0 (ST ) ST } (7.89)
в€‚Пѓ в€‚Пѓ в€љ
= E{eв€’rT V 0 (ST )ST (в€’ПѓT + Z T )} (7.90)

Thus, to estimate the derivative, an average of simulated values of the form
n
1 X в€’rT 0 в€љ
(7.91)
[e V (ST i )(в€’ПѓT + Zi T )]
n i=1
в€љ
where ST i = S0 exp(rT в€’ Пѓ 2 T /2 + Пѓ T Zi ) is the i0 th simulated closing value.
If the function V (.) is close to being constant, then this estimator will have
7.10. INFINITESIMAL PERTURBATION ANALYSIS: PATHWISE DIFFERENTIATION413

variance close to zero and will be quite accurate, likely more accurate than the
score function estimator described in the last section. Consider the case of a
European call option with strike K. Then V (ST ) = (ST в€’ K)+ and V 0 (ST ) =
1[ST >K] . Note that the derivative exists everywhere except at the point K.
The derivative at the point ST = K does not exist. Is this a fatal problem
for IPA? A quick check of the expression (7.91) in this case yields
n
1X в€љ
в€’rT
[ST i 1[ST i >K] (в€’ПѓT + Zi T )] в†’ eв€’rT E[ST i 1[S>K] {ln(ST i /S0 ) в€’ (r + Пѓ)T }]
e
n i=1
в€љ
= S0 П†(d1 ) T

in agreement with the values in Table 8.1. The estimator appears to have
the correct limit in spite of the fact that a very important ingredient in its
derivation, the existence of a derivative, fails. The failure of the diп¬Ђerentiability
at a single point, or even a п¬Ѓnite number of points evidently does not necessarily
bias the IPA estimator. Essentially what is required for the estimator to work is
the ability to interchange expected value and derivative in expressions such as
(7.89). There are some general conditions which permit this interchange given
in Appendix A. Verifying such conditions in any practical simulation is typically
very diп¬ѓcult, but a rule of thumb is that if the value function V () is continuous,
then IPA usually provides consistent estimators, but if it is discontinuous, it
does not. We can sometimes circumvent the problem of non-diп¬Ђerentiability,
by п¬Ѓnding a sequence of everywhere diп¬Ђerentiable functions Vn (x) such that
Vn (x) в†’ V (x) and Vn (x) в†’ V 0 (x) for all x 6= k. Then we can show that with
0
в€‚
Vn replacing V in (7.91), we obtain a consistent estimator of в€‚Пѓ E{eв€’rT Vn (ST )}
and using the Lebesgue dominated convergence theorem, we may carry this
consistency over to V (x). For the call option, we might choose
ВЅ 1 1 1
n(x в€’ K + 4n )2 , for K в€’ 4n < x < K + 4n
Vn (x) = 1
(x в€’ K), for x > K + 4n
1
and Vn (x) = 0 for x < K в€’ 4n , a continuously diп¬Ђerentiable function which
agrees with V (x) both in its value and its derivative everywhere except in the
1 1
diminishing interval (K в€’ 4n , K + 4n ). More generally when V (x) increases at
most linearly in x as x в†’ в€ћ, it is possible to п¬Ѓnd a dominating function, but
if the payoп¬Ђ function V (x) increased at a faster rate, this may not be possible.
Generally, if there are a п¬Ѓnite number of points where the derivative does not
exist, and the payoп¬Ђ function is bounded above by linear functions of the stock
price, an estimator of the form (7.91) can be used.
Inп¬Ѓnitesimal Perturbation Analysis or Pathwise Diп¬Ђerentiation employs the
following simple steps;

1. Write the expected value we wish to determine in terms of the parameters
(as explicit arguments) and random variables whose distribution does not
depend on these parameters (e.g. U [0, 1] or N (0, 1).) The simplest way
to do this may be to use the inverse transform.
414SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

2. Diп¬Ђerentiate this expected value with respect to the parameter of interest,
passing the derivative under the expected value sign.
3. Simulate or numerically determine this expected value.

Since the requirements for unbiasedness of the IPA estimator are rather
subtle, it is desirable to compare the estimator with one that is known to be
unbiased such as the Score function estimator. If both appear to have the
same mean, then it is likely that the IPA estimator is unbiased and has smaller
variance.

7.10.1 Example: IPA estimate of vega
Again consider an estimate of
в€‚
EQ {eв€’rT V (ST )},
vega =
в€‚Пѓ
at time t = 0 with ST , the terminal value of the asset distributed according
to a lognormal distribution with mean S0 erT and volatility parameter Пѓ 2 T . We
begin by writing ST in terms of random variables with distributions that do not
depend on the parameters. Recall that
в€љ
ST = S0 exp{rT в€’ Пѓ 2 T /2 + Пѓ T Z}

with Z a standard normal random variable. Then provided that we can pass
the derivative through the expected value,
в€‚V в€‚ST
= E{eв€’rT V 0 (ST ) }
в€‚Пѓ в€‚Пѓ в€љ
= E{eв€’rT V 0 (ST )ST ( T Z в€’ ПѓT )}.

This can be simulated by generating values of Z and then ST = S0 exp{rT в€’
в€љ в€љ
Пѓ T /2+Пѓ T Z} and averaging the values of eв€’rT V 0 (ST )ST ( T Z в€’ПѓT ). Alter-
2

natively, since this is a one dimensional integral, we can integrate the function
against the standard normal p.d.f.П† i.e.
Zв€ћ в€љ
в€љ в€љ
2 2
eв€’rT V 0 (S0 erT в€’Пѓ T /2+Пѓ T z )S0 erT в€’Пѓ T /2+Пѓ T z ( T z в€’ ПѓT )П†(z)dz.
в€’в€ћ

Note the similarity between this estimator and the score function estimator
в€љ
in the same problem. The primary diп¬Ђerence is that V 0 is multiplied by T z в€’
ПѓT, a linear function of z in this case, but V by a quadratic function of Z in
the case of the score function. The relationship will be clearer later when we see
that the score function estimator can be derived from the IPA estimator using
integration by parts. Because of the high variability of the score function, the
perturbation analysis estimator is substantially better at least for a standard
call option. The following function was used to compare the estimators and
their standard errors.
7.10. INFINITESIMAL PERTURBATION ANALYSIS: PATHWISE DIFFERENTIATION415

function [price,vega,SE]=estvega(Z,S0,sigma,r,T,K)
% two estimators of vega , vega(1)=score function estimator, v(2)=IPA estimator
SE(1),SE(2) their standard errors.
% v=payoп¬Ђ function, vprime is its derivative.
%Z=randn(1,n) is a vector of standard normal
ST=S0*exp(r*T+sigma*sqrt(T)*Z-.5*sigma^2*T);
v=max(0,ST-K);
v1=exp(-r*T)*(v.*((Z.^2-1)/sigma-sqrt(T)*Z));
vprime=ones(1,length(Z)).*(ST>K);
v2=exp(-r*T)*(vprime.*ST.*(sqrt(T)*Z-sigma*T));
vega=[mean(v1) mean(v2)];
SE=sqrt([var(v1) var(v2)]/length(Z));
price=exp(-r*T)*mean(v);
For example the call [price,vega,SE]=estvega(randn(1,500000),10,.2,.1,.25,9) re-
sults in the price of a call option on a stock worth \$10 and with 3 months or
one quarter of a year to maturity, interest rate r = .05, annual volatility 0.20.
The estimated price is \$1.1653 and the two estimates of vega are 0.8835 and
0.9297 with standard errors 0.0238 and 0.0059 respectively. Since the ratio
of standard errors is approximately 4, the IPA estimator is evidently about 16
times as eп¬ѓcient as is the score function estimator in this case, although even
the score function estimator provides reasonable accuracy. Not all derivatives
can be estimated as successfully using IPA however. For example if we are
interested in the Gamma or second derivative of a European call option with
respect to St , V (ST ) = (ST в€’ K)+ and V 00 (x) = 0 for all x 6= K. Thus, if we
are permitted to diп¬Ђerentiate twice under the expected value in

EQ {eв€’rT V (ST )}

we obtain
в€‚ 2 ST
О“ = eв€’rT EQ [V 00 (ST ) 2 ]=0
в€‚S0
which is clearly incorrect. The problem in this case is that the regularity re-
quired for the second interchange of derivative and expectation fails. In general,
if a function is discontinuous as is V 0 (x) in this case, interchanging derivatives
and expected values is problematic, so IPA permits unbiased estimation of the
delta for a European call or put option but not for a digital option having payoп¬Ђ
V (ST ) = 1(ST в‰Ґ K). IPA is based on the diп¬Ђerentiation of the output process.
This makes IPA unsuitable as a вЂњblack-boxвЂќ algorithm. Since expected values
are typically smoother with respect to the parameter than the function being
integrated, the score function estimator, though noisier, provides unbiased esti-
mation under much more general condition because the score function method,
together with its variance reduced variations, only impose regularity on the
input variables and require no knowledge of the process being simulated. How-
ever, the score function method requires that the parameter whose sensitivity is
investigated be a statistical parameter; i.e. index a family of densities, whereas
perturbation analysis allows more general types of parameters.
416SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Example: IPA estimation of the delta of an Asian option.
Suppose St follows the Black-Scholes model and we wish to determine the delta
for an Asian option which has payoп¬Ђ
V (St , 0 < t < T )) = (S в€’ K)+
where
m
1X
S= S(ti )
m i=1
is the average stock price at selected time points. Suppose the process Zt is
generated exactly the same way as St but with initial stock price Z0 = 1. Then
we can generate the process in general using
St = S0 Zt
in which case
в€‚
EQ {eв€’rT (S в€’ K)+ }
delta =
в€‚S0
в€‚S
= eв€’rT EQ [ I(S > K)]
в€‚S0
S
= eв€’rT EQ [ZI(S > K)] = eв€’rT EQ [ I(S > K)].
S0
S
Therefore averaging simulated values of > K) over all simulations pro-
S0 I(S
vided an unbiased estimator of delta.

IPA in the Multivariate Case.
We wish to generate X = (X1 , ..., Xn ) with independent components and let
the cumulative distribution function and the probability density function of Xi
be denoted FiОё (x) and fiОё respectively. One again we wish to estimate the
sensitivity or derivative of the expected value
m(Оё) = EОё V (X1 , . . . Xn , Оё)
with respect to the parameter Оё for some function V. To allow for the most
general situation, we permit Оё to not only aп¬Ђect the distribution of the vari-
ables Xi but also in some cases be an argument of the function V . Suppose
we generate the random variables Xi by inverse transform from a vector of n
в€’1
independent uniform variates Ui according to Xi = FiОё (Ui ). Then note that
в€‚FiОё (Xi )
. Thus, with V (i) = в€‚Vв€‚Xi , and V (Оё) = в€‚V в€‚Оё
(X,Оё) (X,Оё)
в€‚Xi 1
в€‚Оё = в€’ fiОё (Xi ) we
в€‚Оё
have, under conditions permitting the interchange of derivative and integral,
X в€‚V (X, Оё) в€‚Xi в€‚V (X, Оё)
0
}
m (Оё) = E{ +
в€‚Xi в€‚Оё в€‚Оё
i
X 1 в€‚FiОё (Xi )
= E[V (Оё) в€’ V (i) (7.92)
]
fiОё (Xi ) в€‚Оё
i
7.10. INFINITESIMAL PERTURBATION ANALYSIS: PATHWISE DIFFERENTIATION417

This suggests a Monte Carlo estimator, an average over all (independent)
simulations of terms of the form
X V (i) (X, Оё) в€‚FiОё (Xi )
average{V (Оё) (X, Оё) в€’ } (7.93)
fiОё (Xi ) в€‚Оё
i

Again, the Inп¬Ѓnitesimal perturbation analysis estimator, is unbiased if the con-
ditions permitting the required interchange of derivative and integral are met,
otherwise the estimator may be biased. See Cao (1987a) for some conditions.
When the conditions are met, note the relationship between terms in the per-
turbation analysis estimator and the score function estimator, obtained by in-
tegration by parts:
Z
в€‚ ln(fОёi (Xi )) в€‚
EОё [V (X, Оё) ] = EОё V (X1 , ...Xiв€’1 , xi , Xi+1 ...Xn , Оё) fiОё (xi )dxi
в€‚Оё в€‚Оё
в€‚
= EОё V (X1 , ...Xiв€’1 , xi , Xi+1 ...Xn , Оё) {FiОё (в€ћ) в€’ FiОё (в€’в€ћ)}
в€‚Оё
Z
в€‚
в€’ EОё V (i) (X1 , ...xi , ...Xn , Оё) FiОё (xi )dxi
в€‚Оё
в€‚FiОё (Xi )/в€‚Оё
= в€’EОё {V (i) (X, Оё) }.
fiОё (Xi )

For nearly constant functions V , the gradient V (i) is close to zero and the pertur-
bation analysis estimator has small variance. In general, when it is unbiased, it
seems to provide greater eп¬ѓciency than the crude score function estimator. On
the other hand, the comparison is usually carried out in speciп¬Ѓc cases. Without
smoothness in the function V, there seems to be no general reason why pertur-
bation analysis should be preferred. If V is nondiп¬Ђerentiable or discontinuous,
this can introduce potential bias into perturbation analysis estimators. The
inп¬Ѓnitesimal perturbation analysis estimator is an inп¬Ѓnitesimal or limiting ver-
sion of the use of common random numbers as the following argument shows.
Generating XiОё as above, it is reasonable to estimate
m(Оё + Оґ) в€’ m(Оё в€’ Оґ) V (XОё+Оґ , Оё + Оґ) в€’ V (XОёв€’Оґ , Оё в€’ Оґ)
в‰€ .
2Оґ 2Оґ
Taking limits as Оґ в†’ 0 and assuming the gradient exists in a neighborhood of Оё
we arrive at the perturbation analysis estimator.
In the more common circumstance that the function V does not directly
depend on the parameter, the crude Monte Carlo IPA estimator (7.93) is an
average over all (independent) simulations
X V (i) (X) в€‚FiОё (Xi )
в€’ (7.94)
fiОё (Xi ) в€‚Оё
i

в€‚
where the derivatives of в€‚Xi V (X) may be derived through analysis of the system
or through the implicit function theorem if the problem is tractable. In examples
418SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

where IPA has been found to be unbiased, it has also been found to be consistent.
When compared to the crude score function method for these examples, it has
generally been found to be the more eп¬ѓcient of the two, although exceptions to
this rule are easy to п¬Ѓnd.

7.10.2 Sensitivity of the value of a spread option to the
correlation.
Consider two stocks or asset prices with closing values S1 (T ) and S2 (T ) jointly
lognormally distributed with volatility parameters Пѓ1 , Пѓ2 , and correlation ПЃ.
Of course all of the parameters governing this distribution change over time,
including the correlation ПЃ. We are interested in the price of a European call
option on the spread in price between the two stocks, and in particular, the
sensitivity of this price to changes in the correlation. Let the payoп¬Ђ function be

V (S1 (T ), S2 (T )) = max(0, (S1 (T ) в€’ S2 (T ) в€’ K)) (7.95)
2 2
= max(0, [exp{rT в€’ Пѓ1 T /2 + Пѓ1 Z1 } в€’ exp{rT в€’ Пѓ2 T /2 + Пѓ2 Z2 } в€’ K])

for strike price K and correlated standard normal random variables Z1 , Z2 . The
easiest way to generate such random variables is to generate Z1 , Z3 independent
standard normal and then set
p
Z2 = ПЃZ1 + 1 в€’ ПЃ2 Z3 . (7.96)

Then the sensitivity of the option price with respect to ПЃ is the derivative
the discounted expected return
Вµ В¶
в€‚ в€‚
E[eв€’rT v(S1 (T ), S2 (T ))] = в€’Пѓ2 E[exp{в€’Пѓ2 T /2 + Пѓ2 Z2 }
2
Z2 IA ]
в€‚ПЃ в€‚ПЃ

p ПЃ
2
1 в€’ ПЃ2 Z3 )}(Z1 в€’ p
= в€’Пѓ2 E[exp{в€’Пѓ2 T /2 + Пѓ2 (ПЃZ1 + Z3 )1A ] (7.98)
1 в€’ ПЃ2

where 1A is the indicator function of the set
2 2
A = A(Z1 , Z2 ) = [exp{rT в€’ Пѓ1 T /2 + Пѓ1 Z1 } в€’ exp{rT в€’ Пѓ2 T /2 + Пѓ2 Z2 } > K]

and where Z1 , Z3 are independent standard normal random variables and Z2
satisп¬Ѓes (7.96). Thus an IPA estimator of the sensitivity is given by an average
of terms of the form

p ПЃ
2
+ 1 в€’ ПЃ2 Z3 )}(Z1 в€’ p
в€’Пѓ2 exp{в€’Пѓ2 T /2 + Пѓ2 (ПЃZ1 Z3 )1A(Z1 ,Z2 ) . (7.99)
1 в€’ ПЃ2

Of course variance reduction can be easily applied to this estimator, espe-
cially since there is a substantial set on which (7.99) is equal to 0.
7.11. CALIBRATING A MODEL USING SIMULATIONS 419

7.11 Calibrating a Model using simulations
In most of the models discussed in the п¬Ѓnance literature there are a number of
unknown parameters that either must be estimated from historical data or, for
example in order to estimate the risk neutral distribution, calibrated to the
market price of derivatives. Because of the complexity of the model, it is also
common that whatever function we wish to maximize or minimize can only
be evaluated using rather noisy simulations, even after applying the variance
reduction techniques of Chapter 4. Determining a set of parameter values is
the problem addressed in this section. In Chapter 7 we discussed some general
numerical methods for п¬Ѓnding roots and maxima of functions whose evaluations
were subject to noise. However the problem of п¬Ѓnding such maxima is often
linked with the problem of estimating gradients for noisy functions. Since this
is the subject of this chapter, we return to this problem here.
Suppose, for example, we have J traded options on a single underlying asset
with market prices Pj , j = 1, . . . , J. These options may have diп¬Ђerent maturi-
ties. Valuation of these options by simulation requires a model for the underlying
stochastic variables, with a (p Г— 1) parameter vector Оё в€€ О˜ governing the be-
haviour of these variables (О˜ is the parameter space). Let Pj (Оё) be the model
price for option j = 1, . . . , J at parameter value Оё. We wish to choose parameter
values so as to minimize the the diп¬Ђerences Pj (Оё) в€’ Pj , the pricing error for
option j = 1, . . . , J at parameter value Оё. More precisely, we wish to п¬Ѓnd the
Л†
parameter Оё that minimizes the weighted sum of squared pricing errors (SSPE):

J
X
Л† wj (Pj (Оё) в€’ Pj )2 . (7.100)
Оё = arg min
Оёв€€О˜
j=1

where wj , j = 1, ..., J are suitable weights. Information such as moneyness, the
option price, trading volume, open interest and time-to-maturity can be used
in the construction of wj , aп¬Ђecting the contribution of each option to the loss
function so that options used as benchmarks and those that are heavily traded
can contribute more weight to the loss function. This is the familiar nonlinear
weighted least-squares problem studied extensively in statistics. This problem
is usually solved by equating the derivative or gradient of the objective function
to zero and solving for Оё, where, in this case, the gradient of (7.100) is

J
X в€‚
wj (Pj (Оё) в€’ Pj ) (7.101)
grad(Оё) = 2 Pj (Оё).
в€‚Оё
j=1

Finding the root of (7.101) normally would require the functional form of both
в€‚
Pj (Оё) and в€‚Оёi Pj (Оё) which is typically not available for simulation-based models.
Notice that it takes a fairly simple form, however, since if we deп¬Ѓne the inner
product between two vectorsP using the weights wj (so that the inner product
of vector x and vector y is xj yj wj ) then (7.101) equals zero if the pricing
error vector is orthogonal to the vector of greeks for each of the parameters.
420SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

The relevance of the estimation of sensitivities to the problem of solving
(7.101) is now clear. Methods such as the score function method and inп¬Ѓnites-
imal perturbation analysis can be used to estimate the gradient vector
в€‚
Pj (Оё).
в€‚Оё
Estimation of this gradient is crucial to any algorithm which attempts to min-
imize (7.100). We may also estimate the gradient more simply by obtaining
noisy evaluations \Pj (Оё), say, of the function Pj (Оё) at various values of Оё in a
small neighbourhood and п¬Ѓtting a linear regression to these observations.
Let us now describe the various steps in a fairly general algorithm designed to
solve for (7.100). Possible algorithms will diп¬Ђer primarily in how we approximate
в€‚
Pj (Оё) and в€‚Оё Pj (Оё) but in general, even if we use common random numbers to
drive our simulations, estimators of the function Pj (Оё) tend to be more precise
в€‚
than estimators of its gradient в€‚Оё Pj (Оё) .
We begin with simulations conducted at K values of Оё, Оё1 , Оё2 , ,ОёK . By av-
eraging the simulated values of the options at these points we obtain estimated
\
b
option prices for each of the J options О±jk = Pj (Оёk ), j = 1, ..., J, k = 1, ..., K.
We can also use these same simulations to estimate the gradient vector at each
of these points. If we plan to estimate the gradient using regression among
two or more neighbouring values of Оё, then it is important to drive the simu-
lations at diп¬Ђerent parameter values with common random numbers, since this
improves the precision of the estimated gradient. Suppose the gradients are
в€‚
ОІjk = в€‚Оё Pj (Оёk ) (this is a column vector of length p), and the estimator ob-
tained either by using IPA or the score function method or п¬Ѓnite diп¬Ђerences is
b
denoted by ОІjk . If we approximate Pj (Оё) by a linear Taylor series approximation
in a neighbourhood of Оёk , for a particular value of k, we obtain as estimator of
Pj (Оё)
Pj (Оё) ' \Pj (Оё) = О±jk + ОІjk (Оё в€’ Оёk ).
b0
 стр. 1(всего 2)СОДЕРЖАНИЕ >>