Estimating Derivatives and

the Greeks

Estimating the sensitivity of a simulation with respect to changes in the para-

meter values is an important part of establishing the validity of the conclusions.

If a simulation estimates an expected value at certain value of the parameters

with 0.32 ± 0.05 but the derivative with respect to one parameter, say the

volatility parameter σ, is 5, this indicates that a change of the volatility of only

0.02 or 2 percent would result in a change in the average of the order of 0.1.

Since volatility typically changes rapidly by far more than one percent, then the

apparent precision of the estimate 0.32 ± .005 is very misleading.

Of particular importance in ¬nance are certain derivatives of an option price

or portfolio value with respect to the parameters underlying the Black Scholes

model. These are called the “Greeks”, because many of them (not to mention

many parameters and constants used in Statistics, Physics, Mathematics, and

the rest of Science) are denoted by greek letters. Suppose V = V (S(t), t, σ, r)

is the value of a portfolio or a derivative based on an asset S(t) where the

volatility parameter is σ and r is the current spot interest rate. For example for

a single European call option, from the Black-Scholes formula

V (S, t, σ, r) = S¦(d1 ) ’ Ke’r(T ’t) ¦(d2 ) (7.61)

where

ln(S/K) + (r + σ2 /2)(T ’ t)

√

d1 = ,

σ T ’t

√ ln(S/K) + (r ’ σ 2 /2)(T ’ t)

√

d2 = d1 ’ σ T ’ t =

σ T ’t

In this case there are analytic formulae for these quantities given in Table

8.1:

395

396SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Value in

Name Symbol De¬nition

BS model

‚V

delta ∆ ¦(d1 )

‚S

‚2V φ(d1 )

gamma “ √

sσ T ’t \

‚S 2

‚V

K(T ’ t)e’r(T ’t) ¦(d2 )

rho ρ ‚r

sσφ(d1 )

‚V

’ rKe’r(T ’t) ¦(d2 )

theta ˜ √

‚t √

2 T ’t

‚V

V sφ(d1 ) T ’ t

vega ‚σ

TABLE 8:1 The “Greeks” for a European Call in the Black-Scholes model

One example of the above calculations is given below. The remaining are

left as exercises.

‚V ‚

{S¦(d1 ) ’ Ke’r(T ’t) ¦(d2 )}

=

‚S ‚S

‚d1 ‚d2

’ Ke’r(T ’t) φ(d2 )

= ¦(d1 ) + Sφ(d1 )

‚S ‚S

1

= ¦(d1 ) + (Sφ(d1 ) ’ Ke’r(T ’t) φ(d2 )) √

Sσ T ’ t

= ¦(d1 ) since Sφ(d1 ) = Ke’r(T ’t) φ(d2 ).

where φ, ¦ are the standard normal probability density function and cumulative

distribution function respectively. These derivatives are calculated typically not

only because they are relevant to a hedging strategy (especially ∆ and “) but

also because they give an idea as to how rapidly the value of our portfolio is

e¬ected when there is a (possibly adverse) change in one of the parameters.

As an example of the use of these derivatives, it is common to hedge a port-

folio against changes in one or more parameters. A simple hedge was introduced

when we developed the Black-Scholes formula originally. Suppose we wish to

hedge an investment in a call option whose value is given by V = V (S(t), t, σ, r)

at time t with xS units of the underlying stock. Then the value function for

this new portfolio is

V + xS S

and setting the delta of this portfolio equal to zero results in the equation

∆O + xS = 0

or

xS = ’∆O .

This means that we would need to short ∆O units of the stock in order to hedge

the investment in the stock. The delta of an option is therefore the amount of

the stock that we would sell short in order to hedge one unit of the option.

In a sense, owning ∆O units of the stock is locally equivalent to owning one

unit of the stock, at least from the perspective of small changes in the stock

price. In Figure 7.15 below we plot the delta for a call option with strike price

100, interest rate r = 0.05 and T ’ t = 0.25 years. The second derivative,

397

Figure 7.15: Call option Delta in the Black-Scholes model, Exercise price=100,

r = 0.05, T ’ t = 0.25 years.

Figure 7.16: Call option Gamma in the Black-Scholes model, T ’ t = 0.25 years,

σ = 0.3, r = 0.05, K = 100.

398SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

φ(d1 )

the gamma of the call option, is which behaves roughly like a constant

√

Sσ T ’t

times the normal density with mode at S = e’r(T ’t) K (see Figure 7.16).

With only one instrument (in this case the stock) available to hedge our

investment in the call option, there is a single coe¬cient xS that we are free to

choose and therefore only one constraint (portfolio delta equals zero) that we

can arrange to satisfy. However if there were other instruments such as di¬erent

options on the same underlying available, we could construct a hedge setting

other greeks for the hedged portfolio (e.g. gamma, Rho) equal to zero as well.

For example suppose I own a portfolio whose value P = P (S(t), t, σ, r) de-

pends on the price of a stock or index S. I wish to immunize this portfolio

against changes in S by investing directly in the stock S and in an option on

this stock whose value is given by V = V (S(t), t, σ, r) at time t. We will need to

assume that the value functions V, S, P are distinct in the sense that one is not

a linear combination of the other two. Suppose I add to my portfolio xS units

of the stock and xO units of the option so that the value of the new portfolio

has value

P + xS S + xO V.

In order to ensure that this value changes as little as possible when S changes,

set the value of its delta and gamma (¬rst and second derivative with respect

to S) equal to zero. This gives two equations in the two unknown values of

xS , xO .

∆P + xS + xO ∆O = 0

“P + xO “O = 0

where ∆P , ∆0 are the deltas for the original portfolio and option respectively

and “P , “O are the gammas. The solution gives

“P

xo = ’

“O

“P

’ ∆P

xs = ∆O

“O

and the hedged portfolio has value

“P “P

’ ∆P )S ’ (

P + (∆o )V.

“O “O

As an example, suppose our portfolio consists of a digital option which pays $1

at time T = 1 provided that the stock price exceeds some threshold value, KD .

Then the value function for this option is

P = e’r Q[S(1) > KD ]

= e’r ¦(d2 )

with

σ2

ln(S/KD ) + r ’ 2

d2 = .

σ

399

Therefore

‚ ’r ‚d2

e ¦(d2 ) = e’r •(d2 )

∆P =

‚S ‚S

1 1 •(d1 )

= e’r •(d2 ) = S•(d1 ) =

Sσ KSσ Kσ

and

‚ ’r •(d2 ) •(d1 )

e ¦(d2 ) = e’r 2 2 (d2 + σ) =

“P = d1

2 KSσ 2

‚S Sσ

where we again use the fact that

Ke’r •(d2 ) = S•(d1 ).

Finally, if we de¬ne

2 ln(S/KO ) + 0.5(r + σ 2 /2)

d0 =

1

σ

and d0 for the values corresponding to the call option, the coe¬cients of the

2

hedged portfolio are therefore

“P •(d1 )d1 Sσ/2 d1 •(d1 )

xO = ’ =’ =’

KSσ 2 φ(d0 ) 2Kσφ(d0 )

“O 1 1

“P

’ ∆P = ’(xO ∆O + ∆P ).

xS = ∆O

“O

In Figure 7.17 we plot the value of these coe¬cients for the case of the digital

option above.

Evidently the coe¬cients xS < 0, xO > 0 when S < 100 and both options are

deep out of the money indicating that we should short the stock and purchase

the call option. When both options are deep in the money and S is large, the

reverse holds and we short the call option and buy the stock. In the simplest

delta hedge, it is often easy to at least recognize in advance the whether a given

coe¬cient is positive or negative, i.e. whether we are to purchase or sell short

a given instrument. In the case of several instruments, when we set several

derivatives equal to zero, the sign of the coe¬cients are by no means obvious in

advance, one of the reasons why mathematics is very useful in such cases.

The availability of two instruments, the stock and a single option on the

underlying S allow us to adjust a portfolio so that the ¬rst two derivatives of

its value function with respect to S are both zero. The portfolio is therefore

protected against reasonably small changes in S. Similarly, with more options on

the same stock, one could arrange that the portfolio is immunized or protected

against adverse movements in the other parameters as well, including the in-

terest rate and the volatility parameter. This hedged portfolio clearly requires

derivatives of the value function, and for more complicated models than the

Black-Scholes, we require simulation methods not only for valuing options and

portfolios, but also for determining these derivatives with respect to underlying

parameters.

400SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Figure 7.17: The number of units xs and xo required of the underlying and a

call option (K = 100, T = 0.25) to hedge a digital option with K = 100 and

T = 1. Other parameters: σ = 0.3, r = 0.05.

We now return to the question of estimating the sensitivity of simulations to

changes in underlying parameters, usually in a case in which no simple formula

exists for the function or its derivatives. For example in sensitivity analysis, we

wish to estimate an expected value at values of an underlying parameter in the

neighborhood of a given value. In stress testing, we might wish to estimate an

expected value under more extreme conditions, for example if the interest rate

doubled or an exchange rate took a dramatic downturn. Surprisingly, at least in

theory, information can be obtained about the behaviour of a function at values

of the parameter quite di¬erent than those used to conduct the simulation. In

fact, importance sampling appears to permit us to conduct simulations at one

value of a parameter θ and use these estimate expected values corresponding

to all possible values of the parameter. The estimation of an expectation under

one value of a parameter using simulations conducted at another is sometimes

called the “what if” problem. Denote the probability density function of a

random variable or vector X under θ by fθ (x) and assume these densities all have

common support. An expectation calculated under this value of the parameter

will be denoted Eθ (.). If we want to estimate the expected value of T (X), under

di¬erent values ψ of the parameter but our simulations are conducted at the

parameter value θ, note that

(7.62)

m(ψ) = Eψ T (X) = Eθ [T (X)fψ (X)/fθ (X)].

There may be many reasons for our interest in the function m(ψ). A derivative

is priced using current values for the asset price, interest rate, volatility para-

meter etc. and we may wish to graph the price over a range of (possible future)

values of the parameters. The necessity for estimating derivatives in order to

immunize or hedge a portfolio is discussed above. The likelihood ratio estimator

401

or importance sampling estimator T (X)fψ (X)/fθ (X)where X ∼ fθ is an unbi-

ased estimator of m(ψ) so a simulation at θ permits unbiased estimation of the

whole function m(ψ) = Eψ T (X), and thereby also its derivatives.

However, this simple result should be moderated by a study of the precision

of this estimator. Suppose θ is the true value of the parameter and the val-

ues X1 , ..., Xn are simulated as independent random variables from the joint

Q

probability density function n fθ (Xi ). Consider the likelihood ratio

i=1

Qn

f (X )

Qi=1 ψ i

n

i=1 fθ (Xi )

for ψ 6= θ. Notice that upon taking the expectation under θ of the logarithm,

we obtain

n

X

Eθ { ln(fψ (Xi )) ’ ln(fθ (Xi ))} = ’nH(fθ , fψ )

i=1

where H(fθ , fψ ) > 0 is the cross-entropy. Therefore as n ’ ∞,

n

X

ln(fψ (Xi )) ’ ln(fθ (Xi )) ’ ’∞

i=1

with probability one, and upon exponentiating,

Qn

f (X )

Qi=1 ψ i ’ 0.

n

i=1 fθ (Xi )

This means that the likelihood is very much smaller for values of the para-

meter ψ far from the true value than it is at the true value θ. You might think

that since for large n, since the function

Qn

f (X )

Qi=1 ψ i (7.63)

n

i=1 fθ (Xi )

is very close to zero for values of X which have probability close to one, then

the Qn

µ ¶

fψ (Xi )

Eθ T (X1 , ..., Xn ) Qi=1 (7.64)

n

i=1 fθ (Xi )

should be close to zero for any function T (X1 , ..., Xn ). Somewhat paradoxically,

however, if we substitute T (X1 , ..., Xn ) = 1, then it is easy to see that (7.64) is

identical to one for all ψ and all n. Strangely, for large n, the random variable

(7.63) is very close to 0 for almost all X and converges to 0 as n ’ ∞, and yet its

expected value remains 1 for all n. This apparent contradiction stems from the

fact that the likelihood ratios are not uniformly integrable. They do not behave

in the limit as n ’ ∞ in the same way in probability as they do in expectation.

For large n, (7.63) is close to 0 with high probability but as n increases, it takes

larger and larger values with decreasingly small probabilities. In other words,

for large sample sizes, the likelihood ratio (7.63) is rather unstable, taking very

402SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

large values over values of X which have small probability and is close to zero

for the rest of the possible values of X. Consequently, (7.63) has very large

variance. In fact as the sample size ’ ∞, the variance of (7.63) approaches

in¬nity very quickly. In such a situation, sample means or averages may fail to

the expected value or may approach it extremely slowly. This argues against

the using an average which involves the likelihood ratio if there is a substantial

di¬erence between the parameter values ψ and θ and especially if the sample

size is large. Moment-type estimators based on these likelihood ratios will tend

to be very unstable in mean and variance, particularly when ψ is far from θ.

This problem may be partially alleviated if variance reduction or alternative

techniques are employed.

7.8.1 Example

As an example, suppose we wish to estimate

Eψ (X n )

where

1 1

Xn = Sn = (X1 + ... + Xn )

n n

is the sample mean of independent observations X1 , ..., Xn from an exponential

distribution with parameter ψ. Suppose we conduct our simulations at parame-

ter value θ instead, so with

1

fψ (x) = exp(’x/ψ), x > 0,

ψ

we use the estimator

Qn

θn

f (X ) 1 1

Qi=1 ψ i = Sn exp{Sn ( ’ )} (7.65)

T (X1 , ..., Xn ) = X n n

nψ n θψ

i=1 fθ (Xi )

with Xi independent Exp(θ) random variables. Then it is not hard to show that

T (X1 , ..., Xn ) has the correct expected value, ψ, provided that it has a well-

de¬ned expected value at all. The variance, on the other hand, is quite unstable

when there is much separation between θ and ψ. The standard deviation of the

estimator T (1 , ..., Xn ) is plotted in Figure 7.18 in the case n = 10, θ = 1 and

it is clearly small for ψ fairly close (actually a little smaller) than θ but large as

|θ ’ ψ| increases.

7.9 Estimating Derivatives

‚

Let us begin by examining the estimation of the derivative m0 (θ) = ‚θ Eθ T (X)

in general when we are only able to evaluate the function T (X) by simulation,

so there is error in its valuation. We could conduct independent simulations at

two di¬erent values of the parameters, say at θ + h, θ ’ h, average the values of

7.9. ESTIMATING DERIVATIVES 403

Figure 7.18: Standard Deviation of Estimator T (X1 , ..., Xn ) for θ = 1, various

values of ψ.

T (X) under each, resulting say in the estimators m(θ + h) and m(θ ’ h) and

ˆ ˆ

then take as our estimator the di¬erence

m(θ + h) ’ m(θ ’ h)

ˆ ˆ

(7.66)

2h

but this crude estimator su¬ers from a number of disadvantages;

• It requires twice as many simulations as we conduct at a single point.

Since we normally wish to simulate at θ as well, this is three times the

number of simulations as required at a single point.

• It is heavily biased if h is large unless the function m(θ) is close to linear.

• It has very large variance if h is small.

To illustrate the second point, note that provided that the function m is

three times di¬erentiable, using a Taylor series expansion,

m(θ + h) ’ m(θ ’ h) m(θ + h) ’ m(θ ’ h)

ˆ ˆ

}=

E{

2h 2h

1 1 1

{m(θ) + m0 (θ)h + m00 (θ)h2 ’ (m(θ) ’ m0 (θ)h + m00 (θ)h2 ) + O(h3 )}

=

2h 2 2

0 2

= m (θ) + O(h )

as h ’ 0. However the error implicit in the term O(h2 ) can be large when h

is large and if h is small then the third point is problematic. Since

µ ¶

m(θ + h) ’ m(θ ’ h)

ˆ ˆ 1

= 2 {var[m(θ + h)] + var[m(θ ’ h)]} ’ ∞

var ˆ ˆ

2h 4h

404SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

quite quickly as h ’ 0, we cannot a¬ord to permit h too small if we use

independent simulations.

Now we have seen some methods for ameliorating the last of these problems.

Since the derivative requires that we estimate a di¬erence, use of common ran-

dom numbers in the simulations at the two parameter values θ + h and θ ’ h

should reduce the variability, but we are still faced with the problem of taking

the limit of such an estimator as h ’ 0 to estimate the derivative.

7.9.1 The Score Function Estimator.

There are two alternatives that are popularly used, Perturbation Analysis, which

depends on pathwise di¬erentiation, and the score function or Likelihood ratio

method. Both have the feature that a simulation at a single parameter value

allows both estimation of the function and its derivative.

We begin by introducing the score function method. The idea behind the

score function method is very simple, and it involves interchanging derivative

R

‚

and integral. We wish to estimate m0 (θ) = ‚θ T (x)fθ (x)dx where fθ is the

joint probability density function of all of our simulations. Then under regular-

ity conditions called the Cram©r conditions (see for example those required in

the Cramer-Rao inequality, Appendix B), we may interchange the integral and

derivative

Z

‚

0

(7.67)

m (θ) = T (x)fθ (x)dx

‚θ

Z

‚fθ (x)

= T (x) dx = Eθ [T (X)S(θ)]

‚θ

where S(θ) denotes the score function or

‚ln[fθ (x)]

(7.68)

S(θ) = S(θ, x) = .

‚θ

Since the score function has expected value 0, i.e. Eθ {S(θ)} = 0, the quan-

tity Eθ [T (X)S(θ)] is just the covariance between T (X) and S(θ) and this

can be estimated using the sample covariance. In particular if T is a function

of n independent simulations X = (X1 , ..., Xn ) each of which has probability

density function fθ (x), then the score function for the vector of observations is

P

Sn (θ, X) = n S1 (θ, Xi ). with

i=1

‚ln[fθ (x)]

S1 (θ, x) =

‚θ

the score function for a single simulated value. We wish to estimate the covari-

ance

n n

X X

(7.69)

cov(T (X1 , ..., Xn ), S1 (θ, Xi )) = cov(T (X1 , ..., Xn ), S1 (θ, Xi )).

i=1 i=1

7.9. ESTIMATING DERIVATIVES 405

One possibility motivated by (7.69) is to use use the sample covariance between

Pn

independent replications of T (X1 , ..., Xn ) and i=1 S1 (θ, Xi ) over many simu-

lations. More speci¬cally if we denote m independent replications of T by

Tj = T (Xj1 , Xj2 , ..., Xjn ), j = 1, 2, ..., m where {Xji , i = 1, ..., n, j = 1, ..., m}

are independent observations from the probability density function fθ (x). This

‚

provides an estimator of the sensitivity ‚θ E θ T (X). If we denote the score func-

tion for the vector of observations (Xj1 , Xj2 , ..., Xjn ) by

n

X

Sj = S1 (θ, Xji )

i=1

then the simplest version of the score function estimator is the sample covariance

m

1X

Sj (Tj ’ T )

c (7.70)

SF1 = cov(Sj , Tj ) =

m ’ 1 j=1

Pm

1

where T = m j=1 Tj .

However there is a generally more e¬cient estimator motivated by the rela-

tionship between regression coe¬cients and covariance. If we were to construct

a regression of the variables Tj on Sj of the form

Tj = ± + βT |S Sj + µj

then the coe¬cient which minimizes the variance of the errors µj is

cov(Tj , Sj ) cov(Tj , Sj )

βT |S = = .

var(Sj ) nJ1 (θ)

where J1 (θ) = var[S1 (θ, Xi )]. It follows that we can construct an estimator of

the covariance from the estimated regression coe¬cients

Pm

j=1 Sj (Tj ’ T )

b

βT |S = Pm 2

j=1 (Sj ’ S)

from which the covariance (7.69) can be estimated using

b (7.71)

SF2 = βT |S var(Sj ).

Both SF1 and SF2 are estimators of m0 (θ) but S2 is generally more e¬cient.

In fact using the variance of the regression slope estimator, we can obtain an

approximate variance for this estimator:

1

b

var(SF2 ) = [var(Sj )]2 var(βT |S ) ' var(Tj )var(Sj )[1 ’ ρ2 ] (7.72)

ST

m

with

cov(Tj , Sj )

ρST = p .

var(Tj )var(Sj )

n

Notice that (7.72) can be expressed as m var(Tj )J1 (θ)[1 ’ ρ2 ] which makes

ST

it apparent that it may be a noisy estimator for n large unless either m is

correspondingly larger or ρ2 is close to one.

ST

406SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Example. A Monte-Carlo Estimator of Rho.

Suppose are interested in estimating Rho for an option with payo¬ function at

maturity given by V (S(T ), T ), S(T ) the value of the stock at time T. Assume

the Black-Scholes model so that the distribution of S(T ) under the Q measure

√

is lognormal with mean · = S0 exp{rT } and volatility σ T . For brevity we

denote S(T ) by S. Then since S is log-normal, S = eY where Y ∼ N (log(·) ’

σ 2 T /2, σ 2 T ). Note that if g is the corresponding lognormal probability density

function,

Y ’ log(·) + σ 2 T /2

‚log(g)

=

·σ2 T

‚·

Y ’ log(·) + σ 2 T /2 ‚·

‚log(g)

(7.73)

=

·σ2 T

‚r ‚r

ln(S/S0 ) ’ rT + σ2 T /2

=

σ2

Thus an estimator of ρ can be obtained from the sample covariance, over a

large number of simulations, between V (S(T ), T ) and

‚log(g)

or equivalently

‚r

σ ’2 cov(V (S(T ), T ), ln(S(T )/S0 )).

c

The score function estimator can be expressed as a limit (as h ’ 0) of

likelihood ratio estimators. However, the score function is more stable than

is the likelihood ratio for large sample size because its moment behaviour is,

unlike that of the likelihood ratio, similar to its behaviour in probability. Under

Pn standard regularity conditions referred to above, the score function Sn (θ) =

the

i=1 S1 (θ, Xi )) for an independent sample of size n satis¬es a law of large

numbers

1

Sn (θ) ’ E[S1 (θ, Xi )] = 0 (7.74)

n

and a central limit theorem;

1

√ Sn (θ) ’ N (0, J1 (θ)) (7.75)

n

in distribution where the limiting variance J1 (θ) = var[S1 (θ, Xi )]. When n is

large, however, the score function estimator still su¬ers from too much variabil-

ity.

The score function estimator does have a kind of optimality attached to

it, provided that we restrict to estimators which can also be expressed as a

‚

covariance. Among all random functions G(X; θ) which satisfy ‚θ Eθ T (X) =

Eθ [(T (X)G(X; θ)] for all V , the score function cannot be improved on in the

sense that it has the smallest possible variance.

7.9. ESTIMATING DERIVATIVES 407

Conditioning the Score Function Estimator.

One method for reducing the variability of the score function estimator is by

using conditioning. This is particularly easy for the standard distributions in

the exponential family. Note that

m0(θ) = Eθ [T (X)Sn (θ, X)] = Eθ {Eθ [T (X)|Sn (θ, X)]Sn (θ, X)} (7.76)

The conditional expectation Eθ [T (X)|Sn (θ, X)] in the above product can be

estimated by Monte-Carlo provided that we are able to generate the variates Xi

conditional on the value of the score function. The outside integral Eθ {.}over

the distribution of Sn (θ, X) may be conducted either analytically or numerically,

using our knowledge of the ¬nite sample or asymptotic distribution of the score

function.

For brevity, denote Sn (θ, X) by S and its marginal probability density func-

tion by fS (s). Let Xs = (Xs1 , ..., Xsn ), where Xsi , i = 1, ...n are random

variables all generated with the conditional distribution of X = (X1 , ..., Xn )

given S = s for the ¬xed parameter θ. Then based on a sample of size n, the

conditioned score function estimator is:

Z n

1X

(7.77)

( T (Xs ))sfS (s)ds.

n i=1

There are some powerful advantages to (7.77), particularly when the data is

generated from one of the distributions in an exponential family. The expo-

nential family of distributions is a broad class which includes most well-known

continuous and discrete distribution families such as the normal, lognormal,

exponential, gamma, binomial, negative binomial, geometric, and Poisson dis-

tributions.

X1 is said to have an exponential family distribution if its probability density

function with respect to some dominating measure (usually a counting measure

or Lebesgue measure) takes the form:

fθ (x) = e·(θ)Y (x) h(x)c(θ)

for some functions ·(θ), c(θ), Y (x)and h(x). For exponential family distrib-

utions, conditioning on the score is equivalent to conditioning on the sum

Pn 0

i=1 Y (Xi ) provided · (θ) 6= 0 and the score is a simple function of

Z=

Z, viz.

Sn (θ, X) = · 0 (θ){Z ’ nEθ (Y (Xi ))} (7.78)

So revising (7.77) to condition on the sum Z, the estimator becomes

Z

\ 0

m0(θ) = · (θ) E[T (Xs )|Z = z]{Z ’ nEθ (Y (Xi ))}fZ (z)dz

= · 0 (θ)cov(E[T (Xs )|Z], Z) (7.79)

where fZ (z) is the distribution Z.

408SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

When we are attempting to estimate derivatives m0 (θ) simultaneously at

a number of di¬erent values of θ, perhaps in order to ¬t the function with

splines or represent it graphically, there are some very considerable advantages

to the estimator underlying (7.79). Because the conditional expectation does

not depend on the value of θ, we may conduct the simulation (usually at two or

more values of t) at a single convenient θ. The estimated conditional expectation

will be then used in an integral of the form (7.79) for all underlying values of θ.

Similarly, a single simulation can be used to estimate m(ψ) for many di¬erent

values of ψ.

There are a number of simple special cases of exponential family where the

conditional distributions are easily established and simulated. In each case below

we specify the probability density function fZ (z), the conditional distribution

of the vector Xs given Z = z and the function · 0 (θ).

7.9.2 Examples.

1. (Exponential Distribution). Suppose Xi are exponentially P distributed

. Then given n Xi =

1 ’x/θ

with probability density function fθ (x) = θ e i=1

Pn’1

z the values X1 , X1 + X2 , ... i=1 Xi are generated as n ’ 1 Uniform[0, z]

order statistics an from these we obtain the values of Xi as the spacings

between order statistics. In this case · 0 (θ) = θ’2 and fZ , the probability

density function of Z, is Gamma(n, θ).

2. (Gamma distribution). Suppose Xi are distributed as independent

gamma (±, θ) variates with probability density function

x±’1 e’x/θ

(7.80)

fθ (x) = .

“(±)θ±

P

Then conditionally on Z = n Xi , we can generate Xs as follows: gen-

i=1

erate Xs1 as Z— an independent Beta (±, n±) random variable, Xs2 = Z—

P

Beta (±, (n ’ 1)±) ... Xsn = Z ’ n’1 Xsi .In this case · 0 (θ) = θ’2 and

i=1

fZ , the probability density function of Z, is Gamma(n±, θ).

3. (Normal distribution). Suppose Xi has a N (θ, σ 2 )distribution. ThenP

Xs is generated as follows: the distribution of Xs1 given Z = i Xi is

N (Z/n, (1’ n )σ 2 ). The distribution of Xs2 given Xs1 and Z is N ( Z’Xs1 , (1’

1

n’1

Pn’1

1 2 0 ’2

n’1 )σ )... etc., Xsn = Z ’ i=1 Xsi . In this case · (θ) = σ and fZ (z)

2

is the Normal(nθ, σ ) probability density function.

4. (Binomial distribution). Suppose Xi are distributed as binomial

Pn

(m, θ) variates. Then given Z = i=1 Xi , Xs1 has a hypergeometric

distribution with parameters (mn, m, z); given Z and Xs1 , Xs2 has a hy-

pergeometric distribution with parameters (m(n ’ 1), m, Z ’ Xs1 ), etc.,

Pn’1

Xsn = Z ’ i=1 Xsi . In this case Z has the Binomial(mn, θ) distribution

and · 0 (θ) = [θ(1 ’ θ)]’1 .

7.9. ESTIMATING DERIVATIVES 409

5. (Poisson distribution). Suppose Xi , i = 1, ..., n are independent Pois-

Pn

son (θ) random variables. Then given Z = i=1 Xi , the distribution of

Xs1 is Binomial (Z, 1/n), the conditional distribution of Xs2 is Binomial(Z’

Pn’1

1

Xs1 , n’1 )...etc. and Xsn = z ’ i=1 Xsi . In this case · 0 (θ) = θ’1 and

the distribution of Z is Poisson(nθ).

6. (Geometric distribution). Suppose Xi , i =P ..., n are independent

1,

with geometric(θ) distribution. Then given Z = n Xi , the distribution

i=1

of Xs1 is a negative hypergeometric with probability function

¡Z’x’1¢

n’2

f (x) = ¡Z’1¢ .

n’1

P

Similarly, Xs2 , ..Xsn = z ’ n’1 Xsi . In this case Z has a negative

i=1

binomial(n, θ) distribution and · 0 (θ) = ’(1 ’ θ)’1 .

7. (log-Normal Distribution) Suppose Xi i = 1, ..., n are independent

with the log-normal distribution with mean θ and probability density

function fθ (x). Then

ln(x) ’ ln(θ) + σ2 /2

‚ ln(fθ (x))

(7.81)

=

θσ 2

‚θ

and therefore the conditional distribution of X1 given the su¬cient sta-

Q

tistic Z = n Xi is lognormal with has conditional mean Z 1/n exp{(1 ’

i=1

1 σ2 1 2

n ) 2 } and volatility parameter (1 ’ n )σ .

7.9.3 Example. Estimating vega.

Suppose we wish to estimate the vega of an option on some underlying S(t) and

σ is the volatility parameter in the asset price equation. Consider for example

a European option whose underlying has volatility parameter σ. Then the vega

is the derivative of the value of the option with respect to the volatility:

‚

E{e’rT V (ST )}

‚σ

where ST , the terminal value of the asset, is assumed to have a lognormal

distribution with mean S0 erT and variance parameter σ 2 T and V (S) is the payo¬

on maturity of the option. Denote the corresponding lognormal probability

density function of S(T ) by

1

exp{’(log(s) ’ log(S0 ) ’ rT + σ 2 T /2)2 /2σ 2 T }.

√

g(s) =

sσ 2πT

It is easy to translate this lognormal distribution into exponential family form,

with

Y (s) = (log(s) ’ log(S0 ) ’ rT )2

410SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

and

1

·(σ) = .

2σ 2 T

Therefore, from (7.78) the score function is of the form

· 0 (σ)[Y (s) ’ E(Y (s))]

and an unbiased estimator of vega is the sample covariance, taken over all sim-

ulations,

1

cov(e’rT V (ST ), · 0 (σ)Y (ST )) = ’e’rT

c c (7.82)

cov(V (ST ), Y (ST )).

σ3T

Since we can generate ST from a standard normal random variable Z;

√

ST = S0 exp{rT ’ σ2 T /2 + σ T Z}, so (7.83)

Y (ST ) = σ 2 T Z 2 (7.84)

Then the covariance in (7.82) can be written

1

’e’rT cov(V (ST ), Z 2 )

σ

with ST is generated from (7.83). This reduces to a simple one-dimensional

integral with respect to a normal probability density function and we can either

simulate this quantity or use integration. Because of the high variability of

the score function, it is desirable to use variance reduction in evaluating this

estimator or to conduct the integration analytically or numerically. One of

the simplest numerical integration techniques when expectation can be written

with respect to a normal probability density function is Gaussian Quadrature

mentioned below. This is just one example of a whole class of algorithms for

numerical integration with respect to certain weight functions.

7.9.4 Gaussian Quadrature.

We often approximate integrals of the form

Z∞

(7.85)

h(x)φ(x)dx

’∞

where φ is the standard normal probability density function. with a weighted

sum such as

k

X

(7.86)

wi h(xi )

i=1

for certain points xi and corresponding weights wi . Crude Monte Carlo is an

example of such an integration technique where we choose the abscissae xi at

random according to a normal distribution and use equal weights wi = 1/k,

7.9. ESTIMATING DERIVATIVES 411

i=1,2,...,k. However, it is possible to choose the weights so that the approxima-

tion is exact, i.e.

Z∞

k

X

(7.87)

wi h(xi ) = h(x)φ(x)dx

’∞

i=1

whenever h is a polynomial of certain degree. For example if we wish equality

in (7.87) for the function h(x) = xr , we would require

½

Z

k

X ∞

0 r odd

wi xr = xr φ(x)dx = r!

i r even.

(r/2)!2r/2

’∞

i=1

If we impose this requirement for r = 0, 1, 2, ..., k ’ 1, this leads to k equations

in the k unknown values of wi , the ¬rst few of which are

X X X X

2 4

wi x16 = 15

wi = 1, wi xi = 1, wi xi = 3, i

X

wi xk = 0, for k odd.

i

Normally, for given values of xi , it is possible to ¬nd a solution to these k

equations in the k unknowns wi unless the system is singular. Having found the

appropriate value of wi we are then guaranteed (7.87) holds for any polynomial

h(x) of degree k ’ 1 or less.

We can do better than this, however, if we are free not only to choose the

weights, but also the points xi . In fact we can ¬nd values of wi and xi such

that (7.87) holds for all polynomials of degree at most 2k ’ 1. Without verifying

this fact, we simply mention that this is the case if we choose the abscissae to

be the k roots of the Hermite polynomials ,

dk φ(x)

pk (x) = (’1)k [φ(x)]’1 (7.88)

.

dxk

For reference, the Hermite polynomials with degree k · 7 and their roots are:

Roots xi (approximate) Weights wi

k pk (x)

0 1

1 x 0 1

x2 ’ 1 ±1 √

2 1/2

21

x3 ’ 3x 0, ± 3

3 3, 6

x4 ’ 6x2 + 3 ±2.3344, ±0.74196

4 0.0459, 0.4541

x5 ’ 10x3 + 15x 0, ±2.857, ±1.3556

5 0.5333, 0.0113, 0.2221

x6 ’ 15x4 + 45x2 ’ 15 ±3.3243, ±1.8892, ±0.61671

6 0.0046, 0.0886, 0.4088

x7 ’ 21x5 + 105x3 ’ 105x 0, ±3.7504, ±2.3668, ±1.1544

7 0.4571, 0.0005, 0.0308, 0.2401

Finding the roots of these polynomials, and solving for the appropriate weights,

R∞

the corresponding approximation to the integral ’∞ h(t)φ(t)dt using k = 3 is,

for example,

412SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Z √

∞

h(x)φ(x)dx ≈ (2/3)h(0) + (1/6)h(± 3), for k = 3

’∞

Using these, if we wish to evaluate the expected value of a function of X ∼

N (µ, σ 2 ), the approximations are

X

E[h(X)] ≈ wi h(µ + σxi )

i

This formula is exact for h a polynomial of degree 2k ’ 1 or less. Although the

roots xi are not exactly equally spaced, they are fairly close

7.10 In¬nitesimal Perturbation Analysis: Path-

wise di¬erentiation

There is an alternate method for measuring the sensitivity of simulations to per-

turbations in the parameters which can perform better than the score function

method. This method requires assumptions on the derivative of the performance

measure. As a preliminary example, let us return to the problem of estimating

a Greek (e.g. rho, vega, delta or theta) for a European option. We wish to

estimate the derivative of the option price

EQ {e’r(T ’t) V (ST )}

with respect to some parameter (e.g. r, σ, S0 , t) where ST has a lognormal

distribution with mean S0 erT and variance parameter σ 2 T , and V (ST ) is the

value of the option on expiry when the stock price is ST . As an example suppose

we wish to estimate vega, the derivative of the option price with respect to the

unknown volatility parameter σ underlying the distribution of ST , and assume

for simplicity of expressions that t = 0. Suppose we generate ST = S0 exp(rT ’

√

σ 2 T /2 + σZ T ) for a standard normal random variable Z. Notice that ST

is explicitly a function of the parameter σ. Then di¬erentiating directly with

respect to this parameter, provided such a derivative exists and can be moved

under the expectation sign, yields

‚ ‚

E{e’rT V (ST )} = E{e’rT V 0 (ST ) ST } (7.89)

‚σ ‚σ √

= E{e’rT V 0 (ST )ST (’σT + Z T )} (7.90)

Thus, to estimate the derivative, an average of simulated values of the form

n

1 X ’rT 0 √

(7.91)

[e V (ST i )(’σT + Zi T )]

n i=1

√

where ST i = S0 exp(rT ’ σ 2 T /2 + σ T Zi ) is the i0 th simulated closing value.

If the function V (.) is close to being constant, then this estimator will have

7.10. INFINITESIMAL PERTURBATION ANALYSIS: PATHWISE DIFFERENTIATION413

variance close to zero and will be quite accurate, likely more accurate than the

score function estimator described in the last section. Consider the case of a

European call option with strike K. Then V (ST ) = (ST ’ K)+ and V 0 (ST ) =

1[ST >K] . Note that the derivative exists everywhere except at the point K.

The derivative at the point ST = K does not exist. Is this a fatal problem

for IPA? A quick check of the expression (7.91) in this case yields

n

1X √

’rT

[ST i 1[ST i >K] (’σT + Zi T )] ’ e’rT E[ST i 1[S>K] {ln(ST i /S0 ) ’ (r + σ)T }]

e

n i=1

√

= S0 φ(d1 ) T

in agreement with the values in Table 8.1. The estimator appears to have

the correct limit in spite of the fact that a very important ingredient in its

derivation, the existence of a derivative, fails. The failure of the di¬erentiability

at a single point, or even a ¬nite number of points evidently does not necessarily

bias the IPA estimator. Essentially what is required for the estimator to work is

the ability to interchange expected value and derivative in expressions such as

(7.89). There are some general conditions which permit this interchange given

in Appendix A. Verifying such conditions in any practical simulation is typically

very di¬cult, but a rule of thumb is that if the value function V () is continuous,

then IPA usually provides consistent estimators, but if it is discontinuous, it

does not. We can sometimes circumvent the problem of non-di¬erentiability,

by ¬nding a sequence of everywhere di¬erentiable functions Vn (x) such that

Vn (x) ’ V (x) and Vn (x) ’ V 0 (x) for all x 6= k. Then we can show that with

0

‚

Vn replacing V in (7.91), we obtain a consistent estimator of ‚σ E{e’rT Vn (ST )}

and using the Lebesgue dominated convergence theorem, we may carry this

consistency over to V (x). For the call option, we might choose

½ 1 1 1

n(x ’ K + 4n )2 , for K ’ 4n < x < K + 4n

Vn (x) = 1

(x ’ K), for x > K + 4n

1

and Vn (x) = 0 for x < K ’ 4n , a continuously di¬erentiable function which

agrees with V (x) both in its value and its derivative everywhere except in the

1 1

diminishing interval (K ’ 4n , K + 4n ). More generally when V (x) increases at

most linearly in x as x ’ ∞, it is possible to ¬nd a dominating function, but

if the payo¬ function V (x) increased at a faster rate, this may not be possible.

Generally, if there are a ¬nite number of points where the derivative does not

exist, and the payo¬ function is bounded above by linear functions of the stock

price, an estimator of the form (7.91) can be used.

In¬nitesimal Perturbation Analysis or Pathwise Di¬erentiation employs the

following simple steps;

1. Write the expected value we wish to determine in terms of the parameters

(as explicit arguments) and random variables whose distribution does not

depend on these parameters (e.g. U [0, 1] or N (0, 1).) The simplest way

to do this may be to use the inverse transform.

414SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

2. Di¬erentiate this expected value with respect to the parameter of interest,

passing the derivative under the expected value sign.

3. Simulate or numerically determine this expected value.

Since the requirements for unbiasedness of the IPA estimator are rather

subtle, it is desirable to compare the estimator with one that is known to be

unbiased such as the Score function estimator. If both appear to have the

same mean, then it is likely that the IPA estimator is unbiased and has smaller

variance.

7.10.1 Example: IPA estimate of vega

Again consider an estimate of

‚

EQ {e’rT V (ST )},

vega =

‚σ

at time t = 0 with ST , the terminal value of the asset distributed according

to a lognormal distribution with mean S0 erT and volatility parameter σ 2 T . We

begin by writing ST in terms of random variables with distributions that do not

depend on the parameters. Recall that

√

ST = S0 exp{rT ’ σ 2 T /2 + σ T Z}

with Z a standard normal random variable. Then provided that we can pass

the derivative through the expected value,

‚V ‚ST

= E{e’rT V 0 (ST ) }

‚σ ‚σ √

= E{e’rT V 0 (ST )ST ( T Z ’ σT )}.

This can be simulated by generating values of Z and then ST = S0 exp{rT ’

√ √

σ T /2+σ T Z} and averaging the values of e’rT V 0 (ST )ST ( T Z ’σT ). Alter-

2

natively, since this is a one dimensional integral, we can integrate the function

against the standard normal p.d.f.φ i.e.

Z∞ √

√ √

2 2

e’rT V 0 (S0 erT ’σ T /2+σ T z )S0 erT ’σ T /2+σ T z ( T z ’ σT )φ(z)dz.

’∞

Note the similarity between this estimator and the score function estimator

√

in the same problem. The primary di¬erence is that V 0 is multiplied by T z ’

σT, a linear function of z in this case, but V by a quadratic function of Z in

the case of the score function. The relationship will be clearer later when we see

that the score function estimator can be derived from the IPA estimator using

integration by parts. Because of the high variability of the score function, the

perturbation analysis estimator is substantially better at least for a standard

call option. The following function was used to compare the estimators and

their standard errors.

7.10. INFINITESIMAL PERTURBATION ANALYSIS: PATHWISE DIFFERENTIATION415

function [price,vega,SE]=estvega(Z,S0,sigma,r,T,K)

% two estimators of vega , vega(1)=score function estimator, v(2)=IPA estimator

SE(1),SE(2) their standard errors.

% v=payo¬ function, vprime is its derivative.

%Z=randn(1,n) is a vector of standard normal

ST=S0*exp(r*T+sigma*sqrt(T)*Z-.5*sigma^2*T);

v=max(0,ST-K);

v1=exp(-r*T)*(v.*((Z.^2-1)/sigma-sqrt(T)*Z));

vprime=ones(1,length(Z)).*(ST>K);

v2=exp(-r*T)*(vprime.*ST.*(sqrt(T)*Z-sigma*T));

vega=[mean(v1) mean(v2)];

SE=sqrt([var(v1) var(v2)]/length(Z));

price=exp(-r*T)*mean(v);

For example the call [price,vega,SE]=estvega(randn(1,500000),10,.2,.1,.25,9) re-

sults in the price of a call option on a stock worth $10 and with 3 months or

one quarter of a year to maturity, interest rate r = .05, annual volatility 0.20.

The estimated price is $1.1653 and the two estimates of vega are 0.8835 and

0.9297 with standard errors 0.0238 and 0.0059 respectively. Since the ratio

of standard errors is approximately 4, the IPA estimator is evidently about 16

times as e¬cient as is the score function estimator in this case, although even

the score function estimator provides reasonable accuracy. Not all derivatives

can be estimated as successfully using IPA however. For example if we are

interested in the Gamma or second derivative of a European call option with

respect to St , V (ST ) = (ST ’ K)+ and V 00 (x) = 0 for all x 6= K. Thus, if we

are permitted to di¬erentiate twice under the expected value in

EQ {e’rT V (ST )}

we obtain

‚ 2 ST

“ = e’rT EQ [V 00 (ST ) 2 ]=0

‚S0

which is clearly incorrect. The problem in this case is that the regularity re-

quired for the second interchange of derivative and expectation fails. In general,

if a function is discontinuous as is V 0 (x) in this case, interchanging derivatives

and expected values is problematic, so IPA permits unbiased estimation of the

delta for a European call or put option but not for a digital option having payo¬

V (ST ) = 1(ST ≥ K). IPA is based on the di¬erentiation of the output process.

This makes IPA unsuitable as a “black-box” algorithm. Since expected values

are typically smoother with respect to the parameter than the function being

integrated, the score function estimator, though noisier, provides unbiased esti-

mation under much more general condition because the score function method,

together with its variance reduced variations, only impose regularity on the

input variables and require no knowledge of the process being simulated. How-

ever, the score function method requires that the parameter whose sensitivity is

investigated be a statistical parameter; i.e. index a family of densities, whereas

perturbation analysis allows more general types of parameters.

416SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

Example: IPA estimation of the delta of an Asian option.

Suppose St follows the Black-Scholes model and we wish to determine the delta

for an Asian option which has payo¬

V (St , 0 < t < T )) = (S ’ K)+

where

m

1X

S= S(ti )

m i=1

is the average stock price at selected time points. Suppose the process Zt is

generated exactly the same way as St but with initial stock price Z0 = 1. Then

we can generate the process in general using

St = S0 Zt

in which case

‚

EQ {e’rT (S ’ K)+ }

delta =

‚S0

‚S

= e’rT EQ [ I(S > K)]

‚S0

S

= e’rT EQ [ZI(S > K)] = e’rT EQ [ I(S > K)].

S0

S

Therefore averaging simulated values of > K) over all simulations pro-

S0 I(S

vided an unbiased estimator of delta.

IPA in the Multivariate Case.

We wish to generate X = (X1 , ..., Xn ) with independent components and let

the cumulative distribution function and the probability density function of Xi

be denoted Fiθ (x) and fiθ respectively. One again we wish to estimate the

sensitivity or derivative of the expected value

m(θ) = Eθ V (X1 , . . . Xn , θ)

with respect to the parameter θ for some function V. To allow for the most

general situation, we permit θ to not only a¬ect the distribution of the vari-

ables Xi but also in some cases be an argument of the function V . Suppose

we generate the random variables Xi by inverse transform from a vector of n

’1

independent uniform variates Ui according to Xi = Fiθ (Ui ). Then note that

‚Fiθ (Xi )

. Thus, with V (i) = ‚V‚Xi , and V (θ) = ‚V ‚θ

(X,θ) (X,θ)

‚Xi 1

‚θ = ’ fiθ (Xi ) we

‚θ

have, under conditions permitting the interchange of derivative and integral,

X ‚V (X, θ) ‚Xi ‚V (X, θ)

0

}

m (θ) = E{ +

‚Xi ‚θ ‚θ

i

X 1 ‚Fiθ (Xi )

= E[V (θ) ’ V (i) (7.92)

]

fiθ (Xi ) ‚θ

i

7.10. INFINITESIMAL PERTURBATION ANALYSIS: PATHWISE DIFFERENTIATION417

This suggests a Monte Carlo estimator, an average over all (independent)

simulations of terms of the form

X V (i) (X, θ) ‚Fiθ (Xi )

average{V (θ) (X, θ) ’ } (7.93)

fiθ (Xi ) ‚θ

i

Again, the In¬nitesimal perturbation analysis estimator, is unbiased if the con-

ditions permitting the required interchange of derivative and integral are met,

otherwise the estimator may be biased. See Cao (1987a) for some conditions.

When the conditions are met, note the relationship between terms in the per-

turbation analysis estimator and the score function estimator, obtained by in-

tegration by parts:

Z

‚ ln(fθi (Xi )) ‚

Eθ [V (X, θ) ] = Eθ V (X1 , ...Xi’1 , xi , Xi+1 ...Xn , θ) fiθ (xi )dxi

‚θ ‚θ

‚

= Eθ V (X1 , ...Xi’1 , xi , Xi+1 ...Xn , θ) {Fiθ (∞) ’ Fiθ (’∞)}

‚θ

Z

‚

’ Eθ V (i) (X1 , ...xi , ...Xn , θ) Fiθ (xi )dxi

‚θ

‚Fiθ (Xi )/‚θ

= ’Eθ {V (i) (X, θ) }.

fiθ (Xi )

For nearly constant functions V , the gradient V (i) is close to zero and the pertur-

bation analysis estimator has small variance. In general, when it is unbiased, it

seems to provide greater e¬ciency than the crude score function estimator. On

the other hand, the comparison is usually carried out in speci¬c cases. Without

smoothness in the function V, there seems to be no general reason why pertur-

bation analysis should be preferred. If V is nondi¬erentiable or discontinuous,

this can introduce potential bias into perturbation analysis estimators. The

in¬nitesimal perturbation analysis estimator is an in¬nitesimal or limiting ver-

sion of the use of common random numbers as the following argument shows.

Generating Xiθ as above, it is reasonable to estimate

m(θ + δ) ’ m(θ ’ δ) V (Xθ+δ , θ + δ) ’ V (Xθ’δ , θ ’ δ)

≈ .

2δ 2δ

Taking limits as δ ’ 0 and assuming the gradient exists in a neighborhood of θ

we arrive at the perturbation analysis estimator.

In the more common circumstance that the function V does not directly

depend on the parameter, the crude Monte Carlo IPA estimator (7.93) is an

average over all (independent) simulations

X V (i) (X) ‚Fiθ (Xi )

’ (7.94)

fiθ (Xi ) ‚θ

i

‚

where the derivatives of ‚Xi V (X) may be derived through analysis of the system

or through the implicit function theorem if the problem is tractable. In examples

418SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

where IPA has been found to be unbiased, it has also been found to be consistent.

When compared to the crude score function method for these examples, it has

generally been found to be the more e¬cient of the two, although exceptions to

this rule are easy to ¬nd.

7.10.2 Sensitivity of the value of a spread option to the

correlation.

Consider two stocks or asset prices with closing values S1 (T ) and S2 (T ) jointly

lognormally distributed with volatility parameters σ1 , σ2 , and correlation ρ.

Of course all of the parameters governing this distribution change over time,

including the correlation ρ. We are interested in the price of a European call

option on the spread in price between the two stocks, and in particular, the

sensitivity of this price to changes in the correlation. Let the payo¬ function be

V (S1 (T ), S2 (T )) = max(0, (S1 (T ) ’ S2 (T ) ’ K)) (7.95)

2 2

= max(0, [exp{rT ’ σ1 T /2 + σ1 Z1 } ’ exp{rT ’ σ2 T /2 + σ2 Z2 } ’ K])

for strike price K and correlated standard normal random variables Z1 , Z2 . The

easiest way to generate such random variables is to generate Z1 , Z3 independent

standard normal and then set

p

Z2 = ρZ1 + 1 ’ ρ2 Z3 . (7.96)

Then the sensitivity of the option price with respect to ρ is the derivative

the discounted expected return

µ ¶

‚ ‚

E[e’rT v(S1 (T ), S2 (T ))] = ’σ2 E[exp{’σ2 T /2 + σ2 Z2 }

2

Z2 IA ]

‚ρ ‚ρ

p ρ

2

1 ’ ρ2 Z3 )}(Z1 ’ p

= ’σ2 E[exp{’σ2 T /2 + σ2 (ρZ1 + Z3 )1A ] (7.98)

1 ’ ρ2

where 1A is the indicator function of the set

2 2

A = A(Z1 , Z2 ) = [exp{rT ’ σ1 T /2 + σ1 Z1 } ’ exp{rT ’ σ2 T /2 + σ2 Z2 } > K]

and where Z1 , Z3 are independent standard normal random variables and Z2

satis¬es (7.96). Thus an IPA estimator of the sensitivity is given by an average

of terms of the form

p ρ

2

+ 1 ’ ρ2 Z3 )}(Z1 ’ p

’σ2 exp{’σ2 T /2 + σ2 (ρZ1 Z3 )1A(Z1 ,Z2 ) . (7.99)

1 ’ ρ2

Of course variance reduction can be easily applied to this estimator, espe-

cially since there is a substantial set on which (7.99) is equal to 0.

7.11. CALIBRATING A MODEL USING SIMULATIONS 419

7.11 Calibrating a Model using simulations

In most of the models discussed in the ¬nance literature there are a number of

unknown parameters that either must be estimated from historical data or, for

example in order to estimate the risk neutral distribution, calibrated to the

market price of derivatives. Because of the complexity of the model, it is also

common that whatever function we wish to maximize or minimize can only

be evaluated using rather noisy simulations, even after applying the variance

reduction techniques of Chapter 4. Determining a set of parameter values is

the problem addressed in this section. In Chapter 7 we discussed some general

numerical methods for ¬nding roots and maxima of functions whose evaluations

were subject to noise. However the problem of ¬nding such maxima is often

linked with the problem of estimating gradients for noisy functions. Since this

is the subject of this chapter, we return to this problem here.

Suppose, for example, we have J traded options on a single underlying asset

with market prices Pj , j = 1, . . . , J. These options may have di¬erent maturi-

ties. Valuation of these options by simulation requires a model for the underlying

stochastic variables, with a (p — 1) parameter vector θ ∈ ˜ governing the be-

haviour of these variables (˜ is the parameter space). Let Pj (θ) be the model

price for option j = 1, . . . , J at parameter value θ. We wish to choose parameter

values so as to minimize the the di¬erences Pj (θ) ’ Pj , the pricing error for

option j = 1, . . . , J at parameter value θ. More precisely, we wish to ¬nd the

ˆ

parameter θ that minimizes the weighted sum of squared pricing errors (SSPE):

J

X

ˆ wj (Pj (θ) ’ Pj )2 . (7.100)

θ = arg min

θ∈˜

j=1

where wj , j = 1, ..., J are suitable weights. Information such as moneyness, the

option price, trading volume, open interest and time-to-maturity can be used

in the construction of wj , a¬ecting the contribution of each option to the loss

function so that options used as benchmarks and those that are heavily traded

can contribute more weight to the loss function. This is the familiar nonlinear

weighted least-squares problem studied extensively in statistics. This problem

is usually solved by equating the derivative or gradient of the objective function

to zero and solving for θ, where, in this case, the gradient of (7.100) is

J

X ‚

wj (Pj (θ) ’ Pj ) (7.101)

grad(θ) = 2 Pj (θ).

‚θ

j=1

Finding the root of (7.101) normally would require the functional form of both

‚

Pj (θ) and ‚θi Pj (θ) which is typically not available for simulation-based models.

Notice that it takes a fairly simple form, however, since if we de¬ne the inner

product between two vectorsP using the weights wj (so that the inner product

of vector x and vector y is xj yj wj ) then (7.101) equals zero if the pricing

error vector is orthogonal to the vector of greeks for each of the parameters.

420SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

The relevance of the estimation of sensitivities to the problem of solving

(7.101) is now clear. Methods such as the score function method and in¬nites-

imal perturbation analysis can be used to estimate the gradient vector

‚

Pj (θ).

‚θ

Estimation of this gradient is crucial to any algorithm which attempts to min-

imize (7.100). We may also estimate the gradient more simply by obtaining

noisy evaluations \Pj (θ), say, of the function Pj (θ) at various values of θ in a

small neighbourhood and ¬tting a linear regression to these observations.

Let us now describe the various steps in a fairly general algorithm designed to

solve for (7.100). Possible algorithms will di¬er primarily in how we approximate

‚

Pj (θ) and ‚θ Pj (θ) but in general, even if we use common random numbers to

drive our simulations, estimators of the function Pj (θ) tend to be more precise

‚

than estimators of its gradient ‚θ Pj (θ) .

We begin with simulations conducted at K values of θ, θ1 , θ2 , ,θK . By av-

eraging the simulated values of the options at these points we obtain estimated

\

b

option prices for each of the J options ±jk = Pj (θk ), j = 1, ..., J, k = 1, ..., K.

We can also use these same simulations to estimate the gradient vector at each

of these points. If we plan to estimate the gradient using regression among

two or more neighbouring values of θ, then it is important to drive the simu-

lations at di¬erent parameter values with common random numbers, since this

improves the precision of the estimated gradient. Suppose the gradients are

‚

βjk = ‚θ Pj (θk ) (this is a column vector of length p), and the estimator ob-

tained either by using IPA or the score function method or ¬nite di¬erences is

b

denoted by βjk . If we approximate Pj (θ) by a linear Taylor series approximation

in a neighbourhood of θk , for a particular value of k, we obtain as estimator of

Pj (θ)

Pj (θ) ' \Pj (θ) = ±jk + βjk (θ ’ θk ).

b0