This same linear regression gives as an estimator of the gradient at θk ,

‚\ b (7.103)

Pj (θk ) = βjk .

‚θ

Unfortunately, simulations at every value of θk provide estimators of the same

quantity Pj (θ) and some estimators are clearly more precise than others and

so we may wish to combine these individual estimators to provide a weighted

b0

combination of the values ±jk + βjk (θ ’ θk ), k = 1, 2, ...K. The ideal weights

b

would be inversely proportional to the variance of the estimators themselves if

the linear model for Pj (θ) were a perfect ¬t. However, given that it is usually not,

some form of distance between the parameter value θk at which the simulations

were conducted and the parameter at which we wish to extrapolate should also

be included in the weights. We suggest weights which approximate

H(θ ’ θk )

cjk (θ) ∝ (7.104)

var(\Pj (θ))

7.11. CALIBRATING A MODEL USING SIMULATIONS 421

with the “kernel” H(t) having the property that it is maximized when t = 0

and decreases to zero as |t| increases. A simpler example of a kernel function

H(t) is the Gaussian kernel,

p

X

t2 }

H(t) = exp{’c i

i=1

or the Cauchy kernel

1

Pp

H(t) = 2

1+c i=1 ti

for some positive parameter c governing the window width or the amount of

smoothing applied to the observations. The symbol “∝00 indicates that the

weights are proportional to the values on the right hand side, but renormal-

ized so that

K

X

cjk (θ) = 1.

k=1

Then our estimator of Pj (θ) is

K

X

\ b0

cjk (θ)[bjk + βjk (θ ’ θk )]. (7.105)

Pj (θ) = ±

k=1

‚

Similarly, the gradients ‚θ Pj (θ) can be estimated by a weighted average of the

b0

individual estimators βjk leading to the estimator

\ K

X

‚ b (7.106)

Pj (θ) = dk (θ)βjk ,

‚θ

k=1

with

dk (θ) ∝ H(θ ’ θk ).

If we now substitute (7.105, 7.106) in (7.101), we obtain an estimator of

[

the gradient grad(θ). A steepest descent algorithm would consist of updating

the current estimate (on the m0 th step) of the optimal parameter θ(m) , say in a

direction opposite to the gradient vector

[

grad(θ(m) )

θ(m+1) = θ(m) ’ δ

[

|| grad(θ(m) )||

where δ is the stepsize and ||.|| denotes the length of the vector. We will

adjust the step size from time to time so that it eventually converges at the

asymptotically optimal rate of 1/n. This is the case if, at each step, we retain

the original step size δ (m) provided that a new simulation at θ(m+1) shows a

decrease in the objective function (7.100) but if it does not, (this will happen

422SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

at random of the time when we are close enough to the minimum to overstep it

), we reduce the step size according to

δ (m+1) = δ (m) (1 ’ δ (m) ). (7.107)

Then the suggested algorithm is as follows: we begin with a small number K of

simulations at arbitrary points θ1 , ..., θK .

\

b

1. Use simulations at θ1 , ..., θK to estimate the option values ±jk = Pj (θk ),

the gradients βjk = ‚θ\ Pj (θk ) as well as crude estimators of var(Pj (θ)).

\

b ‚

Begin with a step size δ (0) .

2. Use these estimators to obtain weights cjk (θ) from (7.104) and dk (θ) as

well as estimators (7.105),(7.106).

3. Estimate the direction of the gradient vector

[

grad(θ(m) )

δ

[

|| grad(θ(m) )||

4. Set θK+1 =this solution.

5. Conduct simulations at this new parameter θK+1 .

6. With K replaced by K + 1, repeat steps 1-4 until the objective function

J

X

wj (\Pj (θ) ’ Pj )2

j=1

no longer changes signi¬cantly or we have done a maximum number of

iterations.

7. On termination choose the value of θ(m) , m = 1, 2, ....K which minimizes

PJ \ (m) ) ’ Pj )2 .

j=1 wj (Pj (θ

It is well-known that for functions that are non-random, steepest descent is

not a particularly e¬cient minimization routine, because it can bounce back and

forth across a valley, and that methods like Newton™s method which are based

on a quadratic approximation to the function tend to be more e¬cient. For

example setting the gradient (7.101) equal to zero with the estimators (7.105,

7.106) replacing their true values results in the equation

J K K

X X X

b0 b

(m+1)

βjk (θ(m+1) dk (θ(m+1) )βjk ) = 0,

)[bjk ’ Pj + ’ θk )])(

wj ( cjk (θ ±

j=1 k=1 k=1

7.11. CALIBRATING A MODEL USING SIMULATIONS 423

which we might wish to solve for the next parameter value θ(m+1) . It appears

that the success of such an algorithm depends on how precise our gradient

b

estimators βjk are, and in general, since they are quite noisy, this algorithm

relies too heavily on them for determining both direction and distance of travel.

The solution θ(m+1) = θ can be expressed in familiar least squares terms as

\ \

J K J K

X X X X

‚ ‚

b b0

0 ’1

cjk (θ)[Pj ’ ±jk + βjk θk ].

b

θ=[ wj Pj (θ) cjk (θ)βjk ] wj Pj (θ)

‚θ ‚θ

j=1 j=1

k=1 k=1

(7.108)

except that the presence of θ on both sides of (7.108) means that we should

solve this equation iteratively, substituting an old value of θ on the right hand

side. This solution (7.108) can also be regarded as the estimator in a weighted

linear regression if we de¬ne the vector of responses Y as the JK values of

b0

Pj ’ ±jk + βjk θk arranged as a column vector, the JK by JK weight matrix

b

b0

„¦jl,jk = wj djl (θ)cjk (θ) and the matrix X to be the JK vectors βjk stacked to

form a JK by p matrix. In this case (7.108) can be re-expressed in the more

familiar form

θ = (X 0 „¦X)’1 (X 0 „¦Y ).

7.11.1 Example: Calibrating the Normal Inverse Gaussian

model

We return to the problem of calibrating the Normal Inverse Gaussian model to

option the option prices listed in Table 7.1 (the initial value of the S&P 500 was

116.84):

Exercise K 950 1005 1050 1100 1125 1150 1175 1200 1225 1250

Option Price PO (K) 173.1 124.8 88.5 53.7 39.3 27.3 17.85 10.95 6.2 3.25

Table 6.1. Price of SPX Call Options

We ¬t the Normal Inverse Gamma distribution to historical prices in Section

3.4 where we obtained the parameter values

b b

± = 95.23, β = ’4.72, δ = 0.016, µ = 0.0009

b

but of course these are just estimates of the parameters for the historical dis-

tribution and are not necessarily the appropriate parameters for a risk-neutral

distribution. Running the algorithm above for 40 iterations beginning with

b b

b b

± = 3.8010, β = 2.4105, δ = 0.4591, µ = 0.0009

In Figure 7.19 we compare the error in the option pricing formulas for the

NIG and the Black Scholes model. using the algorithm of the last Section.

Notice that there is again no evidence from this graph that the NIG model with

constant parameters ¬ts the risk neutral distribution better than does the Black

Scholes model with constant volatility parameter. We saw a similar comparison

in Chapter three when we compared the two volatility smiles Figure 7.1 and

Figure 7.2.

424SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS

4

NIG process

Black Scholes

2

Empirical Error in Formula for Option Price

0

-2

-4

-6

-8

0 20 40 60 80 100 120 140 160 180

Option Price

Figure 7.19: The error in pricing options on the S&P 500 using the NIG model

and the Black Scholes model.

7.12 Problems

1. Assume that X has a normal (θ, 1) distribution and T (X) = X +

‚

bX 2 + cX 3 . Show we can estimate ‚θ Eθ T (X) = 1 + 2bθ + 3c(1 + θ2 )

by randomly samplingP independent values of Xi , i = 1, 2, ...n and

n

n

1

using the estimator n i=1 T (Xi )(Xi ’ θ). How would the variance of

P

this compare with the variance of an alternative estimator n n V 0 (Xi ).

1

i=1

How do they compare if V is close to being a linear function, i.e. if b, c

are small?

2. Using the Black-Scholes formula for the price of a call option, verify the

following formulae for the greeks. In each case use simulation to verify the

formula in the case T = .25, σ = .3, r = .05, K = S0 = 10.

‚V

(a) (delta) = ¦(d1 )

‚S

‚2V φ(d1 )

=

(b) (gamma) √

‚S 2 sσ T ’t

‚V

K(T ’ t)e’r(T ’t) ¦(d2 )

(c) (Rho) =

‚r

sσφ(d1 )

‚V

’ rKe’r(T ’t) ¦(d2 )

=

(d) (theta) √

‚t 2 T ’t

√

‚V

sφ(d1 ) T ’ t .

(e) (vega) =

‚σ

3. The rho of an option is the derivative of the option price with respect to

the interest rate parameter r. What is the value of ρ for a call option with

S0 = 10, strike=K = 10, r = 0.05, T = .25 and σ = .2? Use a simulation

to estimate this slope and determine the variance of your estimator. Try

using (i) independent simulations at two points and (ii) common random

numbers. What can you say about the variances of your estimators?

7.12. PROBLEMS 425

4. Consider the estimator given by (7.65) when θ = 1. For what values of the

importance sample parameter ψ is the variance of the estimator (7.65)

¬nite?

5. Use the score function method (7.71) and a simulation to estimate the

sensitivity of the value of a digital option to the parameters S0 , r, σ in the

Black Scholes model. Speci¬cally suppose the discounted payo¬ from the

option is

e’rT 1(ST > K)

with ST Normal(ln(S0 ) + rT ’ σ 2 T /2 ,σ 2 T ) and S0 = K = 10, r = 0.05,

T = 0.25. Compare your estimated values with the true values given in

Problem 2.

6. Use numerical quadrature corresponding to the 4™th degree Hermite poly-

nomial to approximate the integral

Z∞

ex •(x)dx

’∞

where •(x) is the standard normal probability density function. Compare

the numerical approximation with the true value of the integral.

7. Use simulation to calibrate the volatility parameter of the Black Scholes

model to the S&P500 option prices in Table 6.1 so that the sum of squares

of the pricing errors is minimized (note we wish to choose a volatility

parameter that does not depend on the strike price). Check your answer

to the results of a numerical minimization where the options are priced

using the Black-Scholes formula.

426SENSITIVITY ANALYSIS, ESTIMATING DERIVATIVES AND THE GREEKS