. 1
( 2)


Sensitivity Analysis,
Estimating Derivatives and
the Greeks

Estimating the sensitivity of a simulation with respect to changes in the para-
meter values is an important part of establishing the validity of the conclusions.
If a simulation estimates an expected value at certain value of the parameters
with 0.32 ± 0.05 but the derivative with respect to one parameter, say the
volatility parameter σ, is 5, this indicates that a change of the volatility of only
0.02 or 2 percent would result in a change in the average of the order of 0.1.
Since volatility typically changes rapidly by far more than one percent, then the
apparent precision of the estimate 0.32 ± .005 is very misleading.
Of particular importance in ¬nance are certain derivatives of an option price
or portfolio value with respect to the parameters underlying the Black Scholes
model. These are called the “Greeks”, because many of them (not to mention
many parameters and constants used in Statistics, Physics, Mathematics, and
the rest of Science) are denoted by greek letters. Suppose V = V (S(t), t, σ, r)
is the value of a portfolio or a derivative based on an asset S(t) where the
volatility parameter is σ and r is the current spot interest rate. For example for
a single European call option, from the Black-Scholes formula

V (S, t, σ, r) = S¦(d1 ) ’ Ke’r(T ’t) ¦(d2 ) (7.61)


ln(S/K) + (r + σ2 /2)(T ’ t)

d1 = ,
σ T ’t
√ ln(S/K) + (r ’ σ 2 /2)(T ’ t)

d2 = d1 ’ σ T ’ t =
σ T ’t

In this case there are analytic formulae for these quantities given in Table


Value in
Name Symbol De¬nition
BS model
delta ∆ ¦(d1 )
‚2V φ(d1 )
gamma “ √
sσ T ’t \
‚S 2
K(T ’ t)e’r(T ’t) ¦(d2 )
rho ρ ‚r
sσφ(d1 )
’ rKe’r(T ’t) ¦(d2 )
theta ˜ √
‚t √
2 T ’t
V sφ(d1 ) T ’ t
vega ‚σ
TABLE 8:1 The “Greeks” for a European Call in the Black-Scholes model

One example of the above calculations is given below. The remaining are
left as exercises.
‚V ‚
{S¦(d1 ) ’ Ke’r(T ’t) ¦(d2 )}
‚S ‚S
‚d1 ‚d2
’ Ke’r(T ’t) φ(d2 )
= ¦(d1 ) + Sφ(d1 )
‚S ‚S
= ¦(d1 ) + (Sφ(d1 ) ’ Ke’r(T ’t) φ(d2 )) √
Sσ T ’ t
= ¦(d1 ) since Sφ(d1 ) = Ke’r(T ’t) φ(d2 ).

where φ, ¦ are the standard normal probability density function and cumulative
distribution function respectively. These derivatives are calculated typically not
only because they are relevant to a hedging strategy (especially ∆ and “) but
also because they give an idea as to how rapidly the value of our portfolio is
e¬ected when there is a (possibly adverse) change in one of the parameters.
As an example of the use of these derivatives, it is common to hedge a port-
folio against changes in one or more parameters. A simple hedge was introduced
when we developed the Black-Scholes formula originally. Suppose we wish to
hedge an investment in a call option whose value is given by V = V (S(t), t, σ, r)
at time t with xS units of the underlying stock. Then the value function for
this new portfolio is
V + xS S
and setting the delta of this portfolio equal to zero results in the equation

∆O + xS = 0

xS = ’∆O .
This means that we would need to short ∆O units of the stock in order to hedge
the investment in the stock. The delta of an option is therefore the amount of
the stock that we would sell short in order to hedge one unit of the option.
In a sense, owning ∆O units of the stock is locally equivalent to owning one
unit of the stock, at least from the perspective of small changes in the stock
price. In Figure 7.15 below we plot the delta for a call option with strike price
100, interest rate r = 0.05 and T ’ t = 0.25 years. The second derivative,

Figure 7.15: Call option Delta in the Black-Scholes model, Exercise price=100,
r = 0.05, T ’ t = 0.25 years.

Figure 7.16: Call option Gamma in the Black-Scholes model, T ’ t = 0.25 years,
σ = 0.3, r = 0.05, K = 100.

φ(d1 )
the gamma of the call option, is which behaves roughly like a constant

Sσ T ’t
times the normal density with mode at S = e’r(T ’t) K (see Figure 7.16).
With only one instrument (in this case the stock) available to hedge our
investment in the call option, there is a single coe¬cient xS that we are free to
choose and therefore only one constraint (portfolio delta equals zero) that we
can arrange to satisfy. However if there were other instruments such as di¬erent
options on the same underlying available, we could construct a hedge setting
other greeks for the hedged portfolio (e.g. gamma, Rho) equal to zero as well.
For example suppose I own a portfolio whose value P = P (S(t), t, σ, r) de-
pends on the price of a stock or index S. I wish to immunize this portfolio
against changes in S by investing directly in the stock S and in an option on
this stock whose value is given by V = V (S(t), t, σ, r) at time t. We will need to
assume that the value functions V, S, P are distinct in the sense that one is not
a linear combination of the other two. Suppose I add to my portfolio xS units
of the stock and xO units of the option so that the value of the new portfolio
has value
P + xS S + xO V.
In order to ensure that this value changes as little as possible when S changes,
set the value of its delta and gamma (¬rst and second derivative with respect
to S) equal to zero. This gives two equations in the two unknown values of
xS , xO .
∆P + xS + xO ∆O = 0
“P + xO “O = 0
where ∆P , ∆0 are the deltas for the original portfolio and option respectively
and “P , “O are the gammas. The solution gives
xo = ’
’ ∆P
xs = ∆O
and the hedged portfolio has value
“P “P
’ ∆P )S ’ (
P + (∆o )V.
“O “O
As an example, suppose our portfolio consists of a digital option which pays $1
at time T = 1 provided that the stock price exceeds some threshold value, KD .
Then the value function for this option is
P = e’r Q[S(1) > KD ]
= e’r ¦(d2 )
ln(S/KD ) + r ’ 2
d2 = .

‚ ’r ‚d2
e ¦(d2 ) = e’r •(d2 )
∆P =
‚S ‚S
1 1 •(d1 )
= e’r •(d2 ) = S•(d1 ) =
Sσ KSσ Kσ
‚ ’r •(d2 ) •(d1 )
e ¦(d2 ) = e’r 2 2 (d2 + σ) =
“P = d1
2 KSσ 2
‚S Sσ
where we again use the fact that

Ke’r •(d2 ) = S•(d1 ).

Finally, if we de¬ne
2 ln(S/KO ) + 0.5(r + σ 2 /2)
d0 =
and d0 for the values corresponding to the call option, the coe¬cients of the
hedged portfolio are therefore
“P •(d1 )d1 Sσ/2 d1 •(d1 )
xO = ’ =’ =’
KSσ 2 φ(d0 ) 2Kσφ(d0 )
“O 1 1
’ ∆P = ’(xO ∆O + ∆P ).
xS = ∆O
In Figure 7.17 we plot the value of these coe¬cients for the case of the digital
option above.
Evidently the coe¬cients xS < 0, xO > 0 when S < 100 and both options are
deep out of the money indicating that we should short the stock and purchase
the call option. When both options are deep in the money and S is large, the
reverse holds and we short the call option and buy the stock. In the simplest
delta hedge, it is often easy to at least recognize in advance the whether a given
coe¬cient is positive or negative, i.e. whether we are to purchase or sell short
a given instrument. In the case of several instruments, when we set several
derivatives equal to zero, the sign of the coe¬cients are by no means obvious in
advance, one of the reasons why mathematics is very useful in such cases.
The availability of two instruments, the stock and a single option on the
underlying S allow us to adjust a portfolio so that the ¬rst two derivatives of
its value function with respect to S are both zero. The portfolio is therefore
protected against reasonably small changes in S. Similarly, with more options on
the same stock, one could arrange that the portfolio is immunized or protected
against adverse movements in the other parameters as well, including the in-
terest rate and the volatility parameter. This hedged portfolio clearly requires
derivatives of the value function, and for more complicated models than the
Black-Scholes, we require simulation methods not only for valuing options and
portfolios, but also for determining these derivatives with respect to underlying

Figure 7.17: The number of units xs and xo required of the underlying and a
call option (K = 100, T = 0.25) to hedge a digital option with K = 100 and
T = 1. Other parameters: σ = 0.3, r = 0.05.

We now return to the question of estimating the sensitivity of simulations to
changes in underlying parameters, usually in a case in which no simple formula
exists for the function or its derivatives. For example in sensitivity analysis, we
wish to estimate an expected value at values of an underlying parameter in the
neighborhood of a given value. In stress testing, we might wish to estimate an
expected value under more extreme conditions, for example if the interest rate
doubled or an exchange rate took a dramatic downturn. Surprisingly, at least in
theory, information can be obtained about the behaviour of a function at values
of the parameter quite di¬erent than those used to conduct the simulation. In
fact, importance sampling appears to permit us to conduct simulations at one
value of a parameter θ and use these estimate expected values corresponding
to all possible values of the parameter. The estimation of an expectation under
one value of a parameter using simulations conducted at another is sometimes
called the “what if” problem. Denote the probability density function of a
random variable or vector X under θ by fθ (x) and assume these densities all have
common support. An expectation calculated under this value of the parameter
will be denoted Eθ (.). If we want to estimate the expected value of T (X), under
di¬erent values ψ of the parameter but our simulations are conducted at the
parameter value θ, note that
m(ψ) = Eψ T (X) = Eθ [T (X)fψ (X)/fθ (X)].
There may be many reasons for our interest in the function m(ψ). A derivative
is priced using current values for the asset price, interest rate, volatility para-
meter etc. and we may wish to graph the price over a range of (possible future)
values of the parameters. The necessity for estimating derivatives in order to
immunize or hedge a portfolio is discussed above. The likelihood ratio estimator

or importance sampling estimator T (X)fψ (X)/fθ (X)where X ∼ fθ is an unbi-
ased estimator of m(ψ) so a simulation at θ permits unbiased estimation of the
whole function m(ψ) = Eψ T (X), and thereby also its derivatives.
However, this simple result should be moderated by a study of the precision
of this estimator. Suppose θ is the true value of the parameter and the val-
ues X1 , ..., Xn are simulated as independent random variables from the joint
probability density function n fθ (Xi ). Consider the likelihood ratio
f (X )
Qi=1 ψ i
i=1 fθ (Xi )

for ψ 6= θ. Notice that upon taking the expectation under θ of the logarithm,
we obtain
Eθ { ln(fψ (Xi )) ’ ln(fθ (Xi ))} = ’nH(fθ , fψ )

where H(fθ , fψ ) > 0 is the cross-entropy. Therefore as n ’ ∞,
ln(fψ (Xi )) ’ ln(fθ (Xi )) ’ ’∞

with probability one, and upon exponentiating,
f (X )
Qi=1 ψ i ’ 0.
i=1 fθ (Xi )

This means that the likelihood is very much smaller for values of the para-
meter ψ far from the true value than it is at the true value θ. You might think
that since for large n, since the function
f (X )
Qi=1 ψ i (7.63)
i=1 fθ (Xi )

is very close to zero for values of X which have probability close to one, then
the Qn
µ ¶
fψ (Xi )
Eθ T (X1 , ..., Xn ) Qi=1 (7.64)
i=1 fθ (Xi )

should be close to zero for any function T (X1 , ..., Xn ). Somewhat paradoxically,
however, if we substitute T (X1 , ..., Xn ) = 1, then it is easy to see that (7.64) is
identical to one for all ψ and all n. Strangely, for large n, the random variable
(7.63) is very close to 0 for almost all X and converges to 0 as n ’ ∞, and yet its
expected value remains 1 for all n. This apparent contradiction stems from the
fact that the likelihood ratios are not uniformly integrable. They do not behave
in the limit as n ’ ∞ in the same way in probability as they do in expectation.
For large n, (7.63) is close to 0 with high probability but as n increases, it takes
larger and larger values with decreasingly small probabilities. In other words,
for large sample sizes, the likelihood ratio (7.63) is rather unstable, taking very

large values over values of X which have small probability and is close to zero
for the rest of the possible values of X. Consequently, (7.63) has very large
variance. In fact as the sample size ’ ∞, the variance of (7.63) approaches
in¬nity very quickly. In such a situation, sample means or averages may fail to
the expected value or may approach it extremely slowly. This argues against
the using an average which involves the likelihood ratio if there is a substantial
di¬erence between the parameter values ψ and θ and especially if the sample
size is large. Moment-type estimators based on these likelihood ratios will tend
to be very unstable in mean and variance, particularly when ψ is far from θ.
This problem may be partially alleviated if variance reduction or alternative
techniques are employed.

7.8.1 Example
As an example, suppose we wish to estimate

Eψ (X n )

1 1
Xn = Sn = (X1 + ... + Xn )
n n
is the sample mean of independent observations X1 , ..., Xn from an exponential
distribution with parameter ψ. Suppose we conduct our simulations at parame-
ter value θ instead, so with
fψ (x) = exp(’x/ψ), x > 0,
we use the estimator
f (X ) 1 1
Qi=1 ψ i = Sn exp{Sn ( ’ )} (7.65)
T (X1 , ..., Xn ) = X n n
nψ n θψ
i=1 fθ (Xi )

with Xi independent Exp(θ) random variables. Then it is not hard to show that
T (X1 , ..., Xn ) has the correct expected value, ψ, provided that it has a well-
de¬ned expected value at all. The variance, on the other hand, is quite unstable
when there is much separation between θ and ψ. The standard deviation of the
estimator T (1 , ..., Xn ) is plotted in Figure 7.18 in the case n = 10, θ = 1 and
it is clearly small for ψ fairly close (actually a little smaller) than θ but large as
|θ ’ ψ| increases.

7.9 Estimating Derivatives

Let us begin by examining the estimation of the derivative m0 (θ) = ‚θ Eθ T (X)
in general when we are only able to evaluate the function T (X) by simulation,
so there is error in its valuation. We could conduct independent simulations at
two di¬erent values of the parameters, say at θ + h, θ ’ h, average the values of

Figure 7.18: Standard Deviation of Estimator T (X1 , ..., Xn ) for θ = 1, various
values of ψ.

T (X) under each, resulting say in the estimators m(θ + h) and m(θ ’ h) and
ˆ ˆ
then take as our estimator the di¬erence
m(θ + h) ’ m(θ ’ h)
ˆ ˆ
but this crude estimator su¬ers from a number of disadvantages;
• It requires twice as many simulations as we conduct at a single point.
Since we normally wish to simulate at θ as well, this is three times the
number of simulations as required at a single point.
• It is heavily biased if h is large unless the function m(θ) is close to linear.
• It has very large variance if h is small.
To illustrate the second point, note that provided that the function m is
three times di¬erentiable, using a Taylor series expansion,
m(θ + h) ’ m(θ ’ h) m(θ + h) ’ m(θ ’ h)
ˆ ˆ
2h 2h
1 1 1
{m(θ) + m0 (θ)h + m00 (θ)h2 ’ (m(θ) ’ m0 (θ)h + m00 (θ)h2 ) + O(h3 )}
2h 2 2
0 2
= m (θ) + O(h )
as h ’ 0. However the error implicit in the term O(h2 ) can be large when h
is large and if h is small then the third point is problematic. Since
µ ¶
m(θ + h) ’ m(θ ’ h)
ˆ ˆ 1
= 2 {var[m(θ + h)] + var[m(θ ’ h)]} ’ ∞
var ˆ ˆ
2h 4h

quite quickly as h ’ 0, we cannot a¬ord to permit h too small if we use
independent simulations.
Now we have seen some methods for ameliorating the last of these problems.
Since the derivative requires that we estimate a di¬erence, use of common ran-
dom numbers in the simulations at the two parameter values θ + h and θ ’ h
should reduce the variability, but we are still faced with the problem of taking
the limit of such an estimator as h ’ 0 to estimate the derivative.

7.9.1 The Score Function Estimator.
There are two alternatives that are popularly used, Perturbation Analysis, which
depends on pathwise di¬erentiation, and the score function or Likelihood ratio
method. Both have the feature that a simulation at a single parameter value
allows both estimation of the function and its derivative.
We begin by introducing the score function method. The idea behind the
score function method is very simple, and it involves interchanging derivative

and integral. We wish to estimate m0 (θ) = ‚θ T (x)fθ (x)dx where fθ is the
joint probability density function of all of our simulations. Then under regular-
ity conditions called the Cram©r conditions (see for example those required in
the Cramer-Rao inequality, Appendix B), we may interchange the integral and

m (θ) = T (x)fθ (x)dx
‚fθ (x)
= T (x) dx = Eθ [T (X)S(θ)]

where S(θ) denotes the score function or

‚ln[fθ (x)]
S(θ) = S(θ, x) = .
Since the score function has expected value 0, i.e. Eθ {S(θ)} = 0, the quan-
tity Eθ [T (X)S(θ)] is just the covariance between T (X) and S(θ) and this
can be estimated using the sample covariance. In particular if T is a function
of n independent simulations X = (X1 , ..., Xn ) each of which has probability
density function fθ (x), then the score function for the vector of observations is
Sn (θ, X) = n S1 (θ, Xi ). with

‚ln[fθ (x)]
S1 (θ, x) =
the score function for a single simulated value. We wish to estimate the covari-
n n
cov(T (X1 , ..., Xn ), S1 (θ, Xi )) = cov(T (X1 , ..., Xn ), S1 (θ, Xi )).
i=1 i=1

One possibility motivated by (7.69) is to use use the sample covariance between
independent replications of T (X1 , ..., Xn ) and i=1 S1 (θ, Xi ) over many simu-
lations. More speci¬cally if we denote m independent replications of T by
Tj = T (Xj1 , Xj2 , ..., Xjn ), j = 1, 2, ..., m where {Xji , i = 1, ..., n, j = 1, ..., m}
are independent observations from the probability density function fθ (x). This

provides an estimator of the sensitivity ‚θ E θ T (X). If we denote the score func-
tion for the vector of observations (Xj1 , Xj2 , ..., Xjn ) by
Sj = S1 (θ, Xji )

then the simplest version of the score function estimator is the sample covariance
Sj (Tj ’ T )
c (7.70)
SF1 = cov(Sj , Tj ) =
m ’ 1 j=1
where T = m j=1 Tj .
However there is a generally more e¬cient estimator motivated by the rela-
tionship between regression coe¬cients and covariance. If we were to construct
a regression of the variables Tj on Sj of the form
Tj = ± + βT |S Sj + µj
then the coe¬cient which minimizes the variance of the errors µj is
cov(Tj , Sj ) cov(Tj , Sj )
βT |S = = .
var(Sj ) nJ1 (θ)
where J1 (θ) = var[S1 (θ, Xi )]. It follows that we can construct an estimator of
the covariance from the estimated regression coe¬cients
j=1 Sj (Tj ’ T )
βT |S = Pm 2
j=1 (Sj ’ S)

from which the covariance (7.69) can be estimated using
b (7.71)
SF2 = βT |S var(Sj ).
Both SF1 and SF2 are estimators of m0 (θ) but S2 is generally more e¬cient.
In fact using the variance of the regression slope estimator, we can obtain an
approximate variance for this estimator:
var(SF2 ) = [var(Sj )]2 var(βT |S ) ' var(Tj )var(Sj )[1 ’ ρ2 ] (7.72)
cov(Tj , Sj )
ρST = p .
var(Tj )var(Sj )
Notice that (7.72) can be expressed as m var(Tj )J1 (θ)[1 ’ ρ2 ] which makes
it apparent that it may be a noisy estimator for n large unless either m is
correspondingly larger or ρ2 is close to one.

Example. A Monte-Carlo Estimator of Rho.

Suppose are interested in estimating Rho for an option with payo¬ function at
maturity given by V (S(T ), T ), S(T ) the value of the stock at time T. Assume
the Black-Scholes model so that the distribution of S(T ) under the Q measure

is lognormal with mean · = S0 exp{rT } and volatility σ T . For brevity we
denote S(T ) by S. Then since S is log-normal, S = eY where Y ∼ N (log(·) ’
σ 2 T /2, σ 2 T ). Note that if g is the corresponding lognormal probability density

Y ’ log(·) + σ 2 T /2
·σ2 T
Y ’ log(·) + σ 2 T /2 ‚·
·σ2 T
‚r ‚r
ln(S/S0 ) ’ rT + σ2 T /2

Thus an estimator of ρ can be obtained from the sample covariance, over a
large number of simulations, between V (S(T ), T ) and
or equivalently

σ ’2 cov(V (S(T ), T ), ln(S(T )/S0 )).

The score function estimator can be expressed as a limit (as h ’ 0) of
likelihood ratio estimators. However, the score function is more stable than
is the likelihood ratio for large sample size because its moment behaviour is,
unlike that of the likelihood ratio, similar to its behaviour in probability. Under
Pn standard regularity conditions referred to above, the score function Sn (θ) =
i=1 S1 (θ, Xi )) for an independent sample of size n satis¬es a law of large
Sn (θ) ’ E[S1 (θ, Xi )] = 0 (7.74)
and a central limit theorem;

√ Sn (θ) ’ N (0, J1 (θ)) (7.75)

in distribution where the limiting variance J1 (θ) = var[S1 (θ, Xi )]. When n is
large, however, the score function estimator still su¬ers from too much variabil-
The score function estimator does have a kind of optimality attached to
it, provided that we restrict to estimators which can also be expressed as a

covariance. Among all random functions G(X; θ) which satisfy ‚θ Eθ T (X) =
Eθ [(T (X)G(X; θ)] for all V , the score function cannot be improved on in the
sense that it has the smallest possible variance.

Conditioning the Score Function Estimator.
One method for reducing the variability of the score function estimator is by
using conditioning. This is particularly easy for the standard distributions in
the exponential family. Note that

m0(θ) = Eθ [T (X)Sn (θ, X)] = Eθ {Eθ [T (X)|Sn (θ, X)]Sn (θ, X)} (7.76)

The conditional expectation Eθ [T (X)|Sn (θ, X)] in the above product can be
estimated by Monte-Carlo provided that we are able to generate the variates Xi
conditional on the value of the score function. The outside integral Eθ {.}over
the distribution of Sn (θ, X) may be conducted either analytically or numerically,
using our knowledge of the ¬nite sample or asymptotic distribution of the score
For brevity, denote Sn (θ, X) by S and its marginal probability density func-
tion by fS (s). Let Xs = (Xs1 , ..., Xsn ), where Xsi , i = 1, ...n are random
variables all generated with the conditional distribution of X = (X1 , ..., Xn )
given S = s for the ¬xed parameter θ. Then based on a sample of size n, the
conditioned score function estimator is:
Z n
( T (Xs ))sfS (s)ds.
n i=1

There are some powerful advantages to (7.77), particularly when the data is
generated from one of the distributions in an exponential family. The expo-
nential family of distributions is a broad class which includes most well-known
continuous and discrete distribution families such as the normal, lognormal,
exponential, gamma, binomial, negative binomial, geometric, and Poisson dis-
X1 is said to have an exponential family distribution if its probability density
function with respect to some dominating measure (usually a counting measure
or Lebesgue measure) takes the form:

fθ (x) = e·(θ)Y (x) h(x)c(θ)
for some functions ·(θ), c(θ), Y (x)and h(x). For exponential family distrib-
utions, conditioning on the score is equivalent to conditioning on the sum
Pn 0
i=1 Y (Xi ) provided · (θ) 6= 0 and the score is a simple function of
Z, viz.
Sn (θ, X) = · 0 (θ){Z ’ nEθ (Y (Xi ))} (7.78)
So revising (7.77) to condition on the sum Z, the estimator becomes
\ 0
m0(θ) = · (θ) E[T (Xs )|Z = z]{Z ’ nEθ (Y (Xi ))}fZ (z)dz

= · 0 (θ)cov(E[T (Xs )|Z], Z) (7.79)

where fZ (z) is the distribution Z.

When we are attempting to estimate derivatives m0 (θ) simultaneously at
a number of di¬erent values of θ, perhaps in order to ¬t the function with
splines or represent it graphically, there are some very considerable advantages
to the estimator underlying (7.79). Because the conditional expectation does
not depend on the value of θ, we may conduct the simulation (usually at two or
more values of t) at a single convenient θ. The estimated conditional expectation
will be then used in an integral of the form (7.79) for all underlying values of θ.
Similarly, a single simulation can be used to estimate m(ψ) for many di¬erent
values of ψ.
There are a number of simple special cases of exponential family where the
conditional distributions are easily established and simulated. In each case below
we specify the probability density function fZ (z), the conditional distribution
of the vector Xs given Z = z and the function · 0 (θ).

7.9.2 Examples.
1. (Exponential Distribution). Suppose Xi are exponentially P distributed
. Then given n Xi =
1 ’x/θ
with probability density function fθ (x) = θ e i=1
z the values X1 , X1 + X2 , ... i=1 Xi are generated as n ’ 1 Uniform[0, z]
order statistics an from these we obtain the values of Xi as the spacings
between order statistics. In this case · 0 (θ) = θ’2 and fZ , the probability
density function of Z, is Gamma(n, θ).

2. (Gamma distribution). Suppose Xi are distributed as independent
gamma (±, θ) variates with probability density function

x±’1 e’x/θ
fθ (x) = .
Then conditionally on Z = n Xi , we can generate Xs as follows: gen-
erate Xs1 as Z— an independent Beta (±, n±) random variable, Xs2 = Z—
Beta (±, (n ’ 1)±) ... Xsn = Z ’ n’1 Xsi .In this case · 0 (θ) = θ’2 and
fZ , the probability density function of Z, is Gamma(n±, θ).

3. (Normal distribution). Suppose Xi has a N (θ, σ 2 )distribution. ThenP
Xs is generated as follows: the distribution of Xs1 given Z = i Xi is
N (Z/n, (1’ n )σ 2 ). The distribution of Xs2 given Xs1 and Z is N ( Z’Xs1 , (1’
1 2 0 ’2
n’1 )σ )... etc., Xsn = Z ’ i=1 Xsi . In this case · (θ) = σ and fZ (z)
is the Normal(nθ, σ ) probability density function.

4. (Binomial distribution). Suppose Xi are distributed as binomial
(m, θ) variates. Then given Z = i=1 Xi , Xs1 has a hypergeometric
distribution with parameters (mn, m, z); given Z and Xs1 , Xs2 has a hy-
pergeometric distribution with parameters (m(n ’ 1), m, Z ’ Xs1 ), etc.,
Xsn = Z ’ i=1 Xsi . In this case Z has the Binomial(mn, θ) distribution
and · 0 (θ) = [θ(1 ’ θ)]’1 .

5. (Poisson distribution). Suppose Xi , i = 1, ..., n are independent Pois-
son (θ) random variables. Then given Z = i=1 Xi , the distribution of
Xs1 is Binomial (Z, 1/n), the conditional distribution of Xs2 is Binomial(Z’
Xs1 , n’1 )...etc. and Xsn = z ’ i=1 Xsi . In this case · 0 (θ) = θ’1 and
the distribution of Z is Poisson(nθ).
6. (Geometric distribution). Suppose Xi , i =P ..., n are independent
with geometric(θ) distribution. Then given Z = n Xi , the distribution
of Xs1 is a negative hypergeometric with probability function
f (x) = ¡Z’1¢ .
Similarly, Xs2 , ..Xsn = z ’ n’1 Xsi . In this case Z has a negative
binomial(n, θ) distribution and · 0 (θ) = ’(1 ’ θ)’1 .
7. (log-Normal Distribution) Suppose Xi i = 1, ..., n are independent
with the log-normal distribution with mean θ and probability density
function fθ (x). Then

ln(x) ’ ln(θ) + σ2 /2
‚ ln(fθ (x))
θσ 2
and therefore the conditional distribution of X1 given the su¬cient sta-
tistic Z = n Xi is lognormal with has conditional mean Z 1/n exp{(1 ’
1 σ2 1 2
n ) 2 } and volatility parameter (1 ’ n )σ .

7.9.3 Example. Estimating vega.
Suppose we wish to estimate the vega of an option on some underlying S(t) and
σ is the volatility parameter in the asset price equation. Consider for example
a European option whose underlying has volatility parameter σ. Then the vega
is the derivative of the value of the option with respect to the volatility:

E{e’rT V (ST )}
where ST , the terminal value of the asset, is assumed to have a lognormal
distribution with mean S0 erT and variance parameter σ 2 T and V (S) is the payo¬
on maturity of the option. Denote the corresponding lognormal probability
density function of S(T ) by
exp{’(log(s) ’ log(S0 ) ’ rT + σ 2 T /2)2 /2σ 2 T }.

g(s) =
sσ 2πT
It is easy to translate this lognormal distribution into exponential family form,
Y (s) = (log(s) ’ log(S0 ) ’ rT )2

·(σ) = .
2σ 2 T
Therefore, from (7.78) the score function is of the form

· 0 (σ)[Y (s) ’ E(Y (s))]

and an unbiased estimator of vega is the sample covariance, taken over all sim-
cov(e’rT V (ST ), · 0 (σ)Y (ST )) = ’e’rT
c c (7.82)
cov(V (ST ), Y (ST )).
Since we can generate ST from a standard normal random variable Z;

ST = S0 exp{rT ’ σ2 T /2 + σ T Z}, so (7.83)
Y (ST ) = σ 2 T Z 2 (7.84)

Then the covariance in (7.82) can be written

’e’rT cov(V (ST ), Z 2 )
with ST is generated from (7.83). This reduces to a simple one-dimensional
integral with respect to a normal probability density function and we can either
simulate this quantity or use integration. Because of the high variability of
the score function, it is desirable to use variance reduction in evaluating this
estimator or to conduct the integration analytically or numerically. One of
the simplest numerical integration techniques when expectation can be written
with respect to a normal probability density function is Gaussian Quadrature
mentioned below. This is just one example of a whole class of algorithms for
numerical integration with respect to certain weight functions.

7.9.4 Gaussian Quadrature.
We often approximate integrals of the form

where φ is the standard normal probability density function. with a weighted
sum such as
wi h(xi )

for certain points xi and corresponding weights wi . Crude Monte Carlo is an
example of such an integration technique where we choose the abscissae xi at
random according to a normal distribution and use equal weights wi = 1/k,

i=1,2,...,k. However, it is possible to choose the weights so that the approxima-
tion is exact, i.e.
wi h(xi ) = h(x)φ(x)dx

whenever h is a polynomial of certain degree. For example if we wish equality
in (7.87) for the function h(x) = xr , we would require
X ∞
0 r odd
wi xr = xr φ(x)dx = r!
i r even.

If we impose this requirement for r = 0, 1, 2, ..., k ’ 1, this leads to k equations
in the k unknown values of wi , the ¬rst few of which are
2 4
wi x16 = 15
wi = 1, wi xi = 1, wi xi = 3, i
wi xk = 0, for k odd.

Normally, for given values of xi , it is possible to ¬nd a solution to these k
equations in the k unknowns wi unless the system is singular. Having found the
appropriate value of wi we are then guaranteed (7.87) holds for any polynomial
h(x) of degree k ’ 1 or less.
We can do better than this, however, if we are free not only to choose the
weights, but also the points xi . In fact we can ¬nd values of wi and xi such
that (7.87) holds for all polynomials of degree at most 2k ’ 1. Without verifying
this fact, we simply mention that this is the case if we choose the abscissae to
be the k roots of the Hermite polynomials ,

dk φ(x)
pk (x) = (’1)k [φ(x)]’1 (7.88)
For reference, the Hermite polynomials with degree k · 7 and their roots are:

Roots xi (approximate) Weights wi
k pk (x)
0 1
1 x 0 1
x2 ’ 1 ±1 √
2 1/2
x3 ’ 3x 0, ± 3
3 3, 6
x4 ’ 6x2 + 3 ±2.3344, ±0.74196
4 0.0459, 0.4541
x5 ’ 10x3 + 15x 0, ±2.857, ±1.3556
5 0.5333, 0.0113, 0.2221
x6 ’ 15x4 + 45x2 ’ 15 ±3.3243, ±1.8892, ±0.61671
6 0.0046, 0.0886, 0.4088
x7 ’ 21x5 + 105x3 ’ 105x 0, ±3.7504, ±2.3668, ±1.1544
7 0.4571, 0.0005, 0.0308, 0.2401

Finding the roots of these polynomials, and solving for the appropriate weights,
the corresponding approximation to the integral ’∞ h(t)φ(t)dt using k = 3 is,
for example,

Z √

h(x)φ(x)dx ≈ (2/3)h(0) + (1/6)h(± 3), for k = 3

Using these, if we wish to evaluate the expected value of a function of X ∼
N (µ, σ 2 ), the approximations are
E[h(X)] ≈ wi h(µ + σxi )

This formula is exact for h a polynomial of degree 2k ’ 1 or less. Although the
roots xi are not exactly equally spaced, they are fairly close

7.10 In¬nitesimal Perturbation Analysis: Path-
wise di¬erentiation
There is an alternate method for measuring the sensitivity of simulations to per-
turbations in the parameters which can perform better than the score function
method. This method requires assumptions on the derivative of the performance
measure. As a preliminary example, let us return to the problem of estimating
a Greek (e.g. rho, vega, delta or theta) for a European option. We wish to
estimate the derivative of the option price

EQ {e’r(T ’t) V (ST )}

with respect to some parameter (e.g. r, σ, S0 , t) where ST has a lognormal
distribution with mean S0 erT and variance parameter σ 2 T , and V (ST ) is the
value of the option on expiry when the stock price is ST . As an example suppose
we wish to estimate vega, the derivative of the option price with respect to the
unknown volatility parameter σ underlying the distribution of ST , and assume
for simplicity of expressions that t = 0. Suppose we generate ST = S0 exp(rT ’

σ 2 T /2 + σZ T ) for a standard normal random variable Z. Notice that ST
is explicitly a function of the parameter σ. Then di¬erentiating directly with
respect to this parameter, provided such a derivative exists and can be moved
under the expectation sign, yields
‚ ‚
E{e’rT V (ST )} = E{e’rT V 0 (ST ) ST } (7.89)
‚σ ‚σ √
= E{e’rT V 0 (ST )ST (’σT + Z T )} (7.90)

Thus, to estimate the derivative, an average of simulated values of the form
1 X ’rT 0 √
[e V (ST i )(’σT + Zi T )]
n i=1

where ST i = S0 exp(rT ’ σ 2 T /2 + σ T Zi ) is the i0 th simulated closing value.
If the function V (.) is close to being constant, then this estimator will have

variance close to zero and will be quite accurate, likely more accurate than the
score function estimator described in the last section. Consider the case of a
European call option with strike K. Then V (ST ) = (ST ’ K)+ and V 0 (ST ) =
1[ST >K] . Note that the derivative exists everywhere except at the point K.
The derivative at the point ST = K does not exist. Is this a fatal problem
for IPA? A quick check of the expression (7.91) in this case yields
1X √
[ST i 1[ST i >K] (’σT + Zi T )] ’ e’rT E[ST i 1[S>K] {ln(ST i /S0 ) ’ (r + σ)T }]
n i=1

= S0 φ(d1 ) T

in agreement with the values in Table 8.1. The estimator appears to have
the correct limit in spite of the fact that a very important ingredient in its
derivation, the existence of a derivative, fails. The failure of the di¬erentiability
at a single point, or even a ¬nite number of points evidently does not necessarily
bias the IPA estimator. Essentially what is required for the estimator to work is
the ability to interchange expected value and derivative in expressions such as
(7.89). There are some general conditions which permit this interchange given
in Appendix A. Verifying such conditions in any practical simulation is typically
very di¬cult, but a rule of thumb is that if the value function V () is continuous,
then IPA usually provides consistent estimators, but if it is discontinuous, it
does not. We can sometimes circumvent the problem of non-di¬erentiability,
by ¬nding a sequence of everywhere di¬erentiable functions Vn (x) such that
Vn (x) ’ V (x) and Vn (x) ’ V 0 (x) for all x 6= k. Then we can show that with

Vn replacing V in (7.91), we obtain a consistent estimator of ‚σ E{e’rT Vn (ST )}
and using the Lebesgue dominated convergence theorem, we may carry this
consistency over to V (x). For the call option, we might choose
½ 1 1 1
n(x ’ K + 4n )2 , for K ’ 4n < x < K + 4n
Vn (x) = 1
(x ’ K), for x > K + 4n
and Vn (x) = 0 for x < K ’ 4n , a continuously di¬erentiable function which
agrees with V (x) both in its value and its derivative everywhere except in the
1 1
diminishing interval (K ’ 4n , K + 4n ). More generally when V (x) increases at
most linearly in x as x ’ ∞, it is possible to ¬nd a dominating function, but
if the payo¬ function V (x) increased at a faster rate, this may not be possible.
Generally, if there are a ¬nite number of points where the derivative does not
exist, and the payo¬ function is bounded above by linear functions of the stock
price, an estimator of the form (7.91) can be used.
In¬nitesimal Perturbation Analysis or Pathwise Di¬erentiation employs the
following simple steps;

1. Write the expected value we wish to determine in terms of the parameters
(as explicit arguments) and random variables whose distribution does not
depend on these parameters (e.g. U [0, 1] or N (0, 1).) The simplest way
to do this may be to use the inverse transform.

2. Di¬erentiate this expected value with respect to the parameter of interest,
passing the derivative under the expected value sign.
3. Simulate or numerically determine this expected value.

Since the requirements for unbiasedness of the IPA estimator are rather
subtle, it is desirable to compare the estimator with one that is known to be
unbiased such as the Score function estimator. If both appear to have the
same mean, then it is likely that the IPA estimator is unbiased and has smaller

7.10.1 Example: IPA estimate of vega
Again consider an estimate of

EQ {e’rT V (ST )},
vega =
at time t = 0 with ST , the terminal value of the asset distributed according
to a lognormal distribution with mean S0 erT and volatility parameter σ 2 T . We
begin by writing ST in terms of random variables with distributions that do not
depend on the parameters. Recall that

ST = S0 exp{rT ’ σ 2 T /2 + σ T Z}

with Z a standard normal random variable. Then provided that we can pass
the derivative through the expected value,
‚V ‚ST
= E{e’rT V 0 (ST ) }
‚σ ‚σ √
= E{e’rT V 0 (ST )ST ( T Z ’ σT )}.

This can be simulated by generating values of Z and then ST = S0 exp{rT ’
√ √
σ T /2+σ T Z} and averaging the values of e’rT V 0 (ST )ST ( T Z ’σT ). Alter-

natively, since this is a one dimensional integral, we can integrate the function
against the standard normal p.d.f.φ i.e.
Z∞ √
√ √
2 2
e’rT V 0 (S0 erT ’σ T /2+σ T z )S0 erT ’σ T /2+σ T z ( T z ’ σT )φ(z)dz.

Note the similarity between this estimator and the score function estimator

in the same problem. The primary di¬erence is that V 0 is multiplied by T z ’
σT, a linear function of z in this case, but V by a quadratic function of Z in
the case of the score function. The relationship will be clearer later when we see
that the score function estimator can be derived from the IPA estimator using
integration by parts. Because of the high variability of the score function, the
perturbation analysis estimator is substantially better at least for a standard
call option. The following function was used to compare the estimators and
their standard errors.

function [price,vega,SE]=estvega(Z,S0,sigma,r,T,K)
% two estimators of vega , vega(1)=score function estimator, v(2)=IPA estimator
SE(1),SE(2) their standard errors.
% v=payo¬ function, vprime is its derivative.
%Z=randn(1,n) is a vector of standard normal
vega=[mean(v1) mean(v2)];
SE=sqrt([var(v1) var(v2)]/length(Z));
For example the call [price,vega,SE]=estvega(randn(1,500000),10,.2,.1,.25,9) re-
sults in the price of a call option on a stock worth $10 and with 3 months or
one quarter of a year to maturity, interest rate r = .05, annual volatility 0.20.
The estimated price is $1.1653 and the two estimates of vega are 0.8835 and
0.9297 with standard errors 0.0238 and 0.0059 respectively. Since the ratio
of standard errors is approximately 4, the IPA estimator is evidently about 16
times as e¬cient as is the score function estimator in this case, although even
the score function estimator provides reasonable accuracy. Not all derivatives
can be estimated as successfully using IPA however. For example if we are
interested in the Gamma or second derivative of a European call option with
respect to St , V (ST ) = (ST ’ K)+ and V 00 (x) = 0 for all x 6= K. Thus, if we
are permitted to di¬erentiate twice under the expected value in

EQ {e’rT V (ST )}

we obtain
‚ 2 ST
“ = e’rT EQ [V 00 (ST ) 2 ]=0
which is clearly incorrect. The problem in this case is that the regularity re-
quired for the second interchange of derivative and expectation fails. In general,
if a function is discontinuous as is V 0 (x) in this case, interchanging derivatives
and expected values is problematic, so IPA permits unbiased estimation of the
delta for a European call or put option but not for a digital option having payo¬
V (ST ) = 1(ST ≥ K). IPA is based on the di¬erentiation of the output process.
This makes IPA unsuitable as a “black-box” algorithm. Since expected values
are typically smoother with respect to the parameter than the function being
integrated, the score function estimator, though noisier, provides unbiased esti-
mation under much more general condition because the score function method,
together with its variance reduced variations, only impose regularity on the
input variables and require no knowledge of the process being simulated. How-
ever, the score function method requires that the parameter whose sensitivity is
investigated be a statistical parameter; i.e. index a family of densities, whereas
perturbation analysis allows more general types of parameters.

Example: IPA estimation of the delta of an Asian option.
Suppose St follows the Black-Scholes model and we wish to determine the delta
for an Asian option which has payo¬
V (St , 0 < t < T )) = (S ’ K)+
S= S(ti )
m i=1
is the average stock price at selected time points. Suppose the process Zt is
generated exactly the same way as St but with initial stock price Z0 = 1. Then
we can generate the process in general using
St = S0 Zt
in which case

EQ {e’rT (S ’ K)+ }
delta =
= e’rT EQ [ I(S > K)]
= e’rT EQ [ZI(S > K)] = e’rT EQ [ I(S > K)].
Therefore averaging simulated values of > K) over all simulations pro-
S0 I(S
vided an unbiased estimator of delta.

IPA in the Multivariate Case.
We wish to generate X = (X1 , ..., Xn ) with independent components and let
the cumulative distribution function and the probability density function of Xi
be denoted Fiθ (x) and fiθ respectively. One again we wish to estimate the
sensitivity or derivative of the expected value
m(θ) = Eθ V (X1 , . . . Xn , θ)
with respect to the parameter θ for some function V. To allow for the most
general situation, we permit θ to not only a¬ect the distribution of the vari-
ables Xi but also in some cases be an argument of the function V . Suppose
we generate the random variables Xi by inverse transform from a vector of n
independent uniform variates Ui according to Xi = Fiθ (Ui ). Then note that
‚Fiθ (Xi )
. Thus, with V (i) = ‚V‚Xi , and V (θ) = ‚V ‚θ
(X,θ) (X,θ)
‚Xi 1
‚θ = ’ fiθ (Xi ) we
have, under conditions permitting the interchange of derivative and integral,
X ‚V (X, θ) ‚Xi ‚V (X, θ)
m (θ) = E{ +
‚Xi ‚θ ‚θ
X 1 ‚Fiθ (Xi )
= E[V (θ) ’ V (i) (7.92)
fiθ (Xi ) ‚θ

This suggests a Monte Carlo estimator, an average over all (independent)
simulations of terms of the form
X V (i) (X, θ) ‚Fiθ (Xi )
average{V (θ) (X, θ) ’ } (7.93)
fiθ (Xi ) ‚θ

Again, the In¬nitesimal perturbation analysis estimator, is unbiased if the con-
ditions permitting the required interchange of derivative and integral are met,
otherwise the estimator may be biased. See Cao (1987a) for some conditions.
When the conditions are met, note the relationship between terms in the per-
turbation analysis estimator and the score function estimator, obtained by in-
tegration by parts:
‚ ln(fθi (Xi )) ‚
Eθ [V (X, θ) ] = Eθ V (X1 , ...Xi’1 , xi , Xi+1 ...Xn , θ) fiθ (xi )dxi
‚θ ‚θ

= Eθ V (X1 , ...Xi’1 , xi , Xi+1 ...Xn , θ) {Fiθ (∞) ’ Fiθ (’∞)}

’ Eθ V (i) (X1 , ...xi , ...Xn , θ) Fiθ (xi )dxi
‚Fiθ (Xi )/‚θ
= ’Eθ {V (i) (X, θ) }.
fiθ (Xi )

For nearly constant functions V , the gradient V (i) is close to zero and the pertur-
bation analysis estimator has small variance. In general, when it is unbiased, it
seems to provide greater e¬ciency than the crude score function estimator. On
the other hand, the comparison is usually carried out in speci¬c cases. Without
smoothness in the function V, there seems to be no general reason why pertur-
bation analysis should be preferred. If V is nondi¬erentiable or discontinuous,
this can introduce potential bias into perturbation analysis estimators. The
in¬nitesimal perturbation analysis estimator is an in¬nitesimal or limiting ver-
sion of the use of common random numbers as the following argument shows.
Generating Xiθ as above, it is reasonable to estimate
m(θ + δ) ’ m(θ ’ δ) V (Xθ+δ , θ + δ) ’ V (Xθ’δ , θ ’ δ)
≈ .
2δ 2δ
Taking limits as δ ’ 0 and assuming the gradient exists in a neighborhood of θ
we arrive at the perturbation analysis estimator.
In the more common circumstance that the function V does not directly
depend on the parameter, the crude Monte Carlo IPA estimator (7.93) is an
average over all (independent) simulations
X V (i) (X) ‚Fiθ (Xi )
’ (7.94)
fiθ (Xi ) ‚θ

where the derivatives of ‚Xi V (X) may be derived through analysis of the system
or through the implicit function theorem if the problem is tractable. In examples

where IPA has been found to be unbiased, it has also been found to be consistent.
When compared to the crude score function method for these examples, it has
generally been found to be the more e¬cient of the two, although exceptions to
this rule are easy to ¬nd.

7.10.2 Sensitivity of the value of a spread option to the
Consider two stocks or asset prices with closing values S1 (T ) and S2 (T ) jointly
lognormally distributed with volatility parameters σ1 , σ2 , and correlation ρ.
Of course all of the parameters governing this distribution change over time,
including the correlation ρ. We are interested in the price of a European call
option on the spread in price between the two stocks, and in particular, the
sensitivity of this price to changes in the correlation. Let the payo¬ function be

V (S1 (T ), S2 (T )) = max(0, (S1 (T ) ’ S2 (T ) ’ K)) (7.95)
2 2
= max(0, [exp{rT ’ σ1 T /2 + σ1 Z1 } ’ exp{rT ’ σ2 T /2 + σ2 Z2 } ’ K])

for strike price K and correlated standard normal random variables Z1 , Z2 . The
easiest way to generate such random variables is to generate Z1 , Z3 independent
standard normal and then set
Z2 = ρZ1 + 1 ’ ρ2 Z3 . (7.96)

Then the sensitivity of the option price with respect to ρ is the derivative
the discounted expected return
µ ¶
‚ ‚
E[e’rT v(S1 (T ), S2 (T ))] = ’σ2 E[exp{’σ2 T /2 + σ2 Z2 }
Z2 IA ]
‚ρ ‚ρ

p ρ
1 ’ ρ2 Z3 )}(Z1 ’ p
= ’σ2 E[exp{’σ2 T /2 + σ2 (ρZ1 + Z3 )1A ] (7.98)
1 ’ ρ2

where 1A is the indicator function of the set
2 2
A = A(Z1 , Z2 ) = [exp{rT ’ σ1 T /2 + σ1 Z1 } ’ exp{rT ’ σ2 T /2 + σ2 Z2 } > K]

and where Z1 , Z3 are independent standard normal random variables and Z2
satis¬es (7.96). Thus an IPA estimator of the sensitivity is given by an average
of terms of the form

p ρ
+ 1 ’ ρ2 Z3 )}(Z1 ’ p
’σ2 exp{’σ2 T /2 + σ2 (ρZ1 Z3 )1A(Z1 ,Z2 ) . (7.99)
1 ’ ρ2

Of course variance reduction can be easily applied to this estimator, espe-
cially since there is a substantial set on which (7.99) is equal to 0.

7.11 Calibrating a Model using simulations
In most of the models discussed in the ¬nance literature there are a number of
unknown parameters that either must be estimated from historical data or, for
example in order to estimate the risk neutral distribution, calibrated to the
market price of derivatives. Because of the complexity of the model, it is also
common that whatever function we wish to maximize or minimize can only
be evaluated using rather noisy simulations, even after applying the variance
reduction techniques of Chapter 4. Determining a set of parameter values is
the problem addressed in this section. In Chapter 7 we discussed some general
numerical methods for ¬nding roots and maxima of functions whose evaluations
were subject to noise. However the problem of ¬nding such maxima is often
linked with the problem of estimating gradients for noisy functions. Since this
is the subject of this chapter, we return to this problem here.
Suppose, for example, we have J traded options on a single underlying asset
with market prices Pj , j = 1, . . . , J. These options may have di¬erent maturi-
ties. Valuation of these options by simulation requires a model for the underlying
stochastic variables, with a (p — 1) parameter vector θ ∈ ˜ governing the be-
haviour of these variables (˜ is the parameter space). Let Pj (θ) be the model
price for option j = 1, . . . , J at parameter value θ. We wish to choose parameter
values so as to minimize the the di¬erences Pj (θ) ’ Pj , the pricing error for
option j = 1, . . . , J at parameter value θ. More precisely, we wish to ¬nd the
parameter θ that minimizes the weighted sum of squared pricing errors (SSPE):

ˆ wj (Pj (θ) ’ Pj )2 . (7.100)
θ = arg min

where wj , j = 1, ..., J are suitable weights. Information such as moneyness, the
option price, trading volume, open interest and time-to-maturity can be used
in the construction of wj , a¬ecting the contribution of each option to the loss
function so that options used as benchmarks and those that are heavily traded
can contribute more weight to the loss function. This is the familiar nonlinear
weighted least-squares problem studied extensively in statistics. This problem
is usually solved by equating the derivative or gradient of the objective function
to zero and solving for θ, where, in this case, the gradient of (7.100) is

X ‚
wj (Pj (θ) ’ Pj ) (7.101)
grad(θ) = 2 Pj (θ).

Finding the root of (7.101) normally would require the functional form of both

Pj (θ) and ‚θi Pj (θ) which is typically not available for simulation-based models.
Notice that it takes a fairly simple form, however, since if we de¬ne the inner
product between two vectorsP using the weights wj (so that the inner product
of vector x and vector y is xj yj wj ) then (7.101) equals zero if the pricing
error vector is orthogonal to the vector of greeks for each of the parameters.

The relevance of the estimation of sensitivities to the problem of solving
(7.101) is now clear. Methods such as the score function method and in¬nites-
imal perturbation analysis can be used to estimate the gradient vector

Pj (θ).
Estimation of this gradient is crucial to any algorithm which attempts to min-
imize (7.100). We may also estimate the gradient more simply by obtaining
noisy evaluations \Pj (θ), say, of the function Pj (θ) at various values of θ in a
small neighbourhood and ¬tting a linear regression to these observations.
Let us now describe the various steps in a fairly general algorithm designed to
solve for (7.100). Possible algorithms will di¬er primarily in how we approximate

Pj (θ) and ‚θ Pj (θ) but in general, even if we use common random numbers to
drive our simulations, estimators of the function Pj (θ) tend to be more precise

than estimators of its gradient ‚θ Pj (θ) .
We begin with simulations conducted at K values of θ, θ1 , θ2 , ,θK . By av-
eraging the simulated values of the options at these points we obtain estimated
option prices for each of the J options ±jk = Pj (θk ), j = 1, ..., J, k = 1, ..., K.
We can also use these same simulations to estimate the gradient vector at each
of these points. If we plan to estimate the gradient using regression among
two or more neighbouring values of θ, then it is important to drive the simu-
lations at di¬erent parameter values with common random numbers, since this
improves the precision of the estimated gradient. Suppose the gradients are

βjk = ‚θ Pj (θk ) (this is a column vector of length p), and the estimator ob-
tained either by using IPA or the score function method or ¬nite di¬erences is
denoted by βjk . If we approximate Pj (θ) by a linear Taylor series approximation
in a neighbourhood of θk , for a particular value of k, we obtain as estimator of
Pj (θ)
Pj (θ) ' \Pj (θ) = ±jk + βjk (θ ’ θk ).

. 1
( 2)