n

1X

c (7.35)

µy = xi

n i=1

n

1X

\ (xi ’ µx )(yi ’ µy ).

c c (7.36)

cov(X, Y ) =

n i=1

For small sample size the estimator (7.36) can be adjusted for bias by replacing

1 1

n by n’1 . The projection argument underlying the EM algorithm indicates

that the maximum likelihood estimating function for the incomplete data is

obtained by conditioning on the observed information. Projecting the three

equations above on the observed data corresponds to doing the following: terms

xi , yi or (xi ’ µx )(yi ’ µy ) in which one or both of (xi , yi ) are unobserved,

c c

1. If xi is unobserved but xi + xi+1 , yi , yi+1 are all observed, is observed,

replace xi by

xi + xi+1 βx|y

(yi ’ yi+1 )

b

xi = +

2 2

and

xi+1 = (xi + xi+1 ) ’ xi .

b b

7.4. MAXIMUM LIKELIHOOD ESTIMATION 365

2. If yi is unobserved but yi + yi+1 , xi , xi+1 are all observed, replace yi by

yi + yi+1 βy|x

(xi ’ xi+1 )

b

yi = +

2 2

and

yi+1 = (yi + yi+1 ) ’ yi .

b b

3. If all of xi , yi xi+1 , yi+1 are unobserved but xi + xi+1 and yi + yi+1 are

observed, replace the two terms (xi ’c )(yi ’c ) and (xi+1 ’c )(yi+1 ’c )

µx µy µx µy

in (7.36) by the single term

1

(xi + xi+1 ’ 2c )(yi + yi+1 ’ 2c ).

µx µy

2

Since there is now one fewer term in (7.36), reduce the value of n appearing

there by one.

Here the regression coe¬cients βx|y = cov(x, y)/var(y) and βy|x = cov(x, y)/var(x)

can either be estimated in a similar fashion from the data at hand, or estimated

from a larger data-set in which both xi and yi are observed provided we think

they are reasonably stable over time.

There are, of course, patterns for the missing data other than those consid-

ered in 1-3 above. They are rare in the case of two-dimensional data but much

more likely in the case of higher dimensional correlated observations. The ad-

justments to the maximum likelihood estimators required to accommodate such

patterns become increasingly complex, virtually impossible to provide a simple

formula. The imputation method discussed earlier in this chapter, in which the

missing observations are replaced not by their conditional mean but by imputed

values with the same characteristics as the original complete data is the most

successful method for providing estimates in such circumstances since it has only

two ingredients, the relatively simple formulae for the complete data maximum

likelihood estimators, and the ability to impute or generate the missing data

with the correct distribution.

7.4.4 Monte Carlo Maximum Likelihood Estimation

Consider to begin with independent observations X1 , ..., Xn all from the same

probability density function fθ (x). The maximum likelihood estimator of the

parameter is the value θ which maximizes the likelihood

n

Y

L(θ) = fθ (xi ).

i=1

The ideal Monte Carlo methodology for maximum likelihood estimation would

not require speci¬cation of the family of densities fθ at all, but only the ability

to generate variables Xi corresponding to any value of θ. However a moment™s

366 ESTIMATION AND CALIBRATION

re¬‚ection will convince us that in general this is a tall order. By repeated simu-

lation of variables Xi under the parameter value θ and averaging, Monte Carlo

methods permit us to estimate unbiasedly any expected value, such as Eθ [ψ(X)]

for arbitrary function ψ. Unfortunately the probability density function at a par-

ticular point xi cannot be written as such an expected value, except in the case

of a discrete random variable X where, of course, Pθ (X = xi ) = Eθ [I(X = xi )].

If we have a method for estimating the density (or probability function if X

is discrete) from simulations Xi (θ) taken for each parameter value θ, we might

b

use these simulations to estimate fθ (x) say with fθ (x) and then maximize the

estimated likelihood

n

Y

b

b (7.37)

L(θ) = fθ (xi ).

i=1

This estimator n

Y

b

arg max fθ (xi )

i=1

avoids any explicit form for the likelihood function at the cost of dealing only

with an approximation to the likelihood and typically will require a very large

simulation. Since the optimality properties of the maximum likelihood estimator

depend on the local ¬rst and second derivatives of fθ (xi ) in a neighbourhood

of the true value it is essential that such an approximation be very accurate

in a neighbourhood of the estimator. One of the most common methods for

approximating fθ (x) is that of kernel density estimation which relies on a kernel

function K(x) satisfying Z ∞

K(x)dx = 1,

’∞

often a probability density function centered around 0. Let Xj (θ), j = 1, ..., m

denote simulated data under the parameter value θ, if possible using common

random numbers. When the distributions have easily inverted cumulative dis-

’1

tribution functions Fθ , for example, we could de¬ne Xi (θ) = Fθ (Ui ) where

Ui are independent uniform[0,1]. We must choose a window width parameter h

controlling the degree of smoothing, and then we estimate the density using

m

1X xi ’ Xj (θ)

b

fθ (xi ) = K( )

mh j=1 h

and so (7.37) becomes

n m

YX xi ’ Xj (θ)

b ’n

{ (7.38)

L(θ) = (mh) K( )}.

h

i=1 j=1

Instead of operating completely free of the likelihood function, if we have

partial information on fθ , Monte Carlo Maximum Likelihood can be made more

e¬cient. For example it is reasonably common to know fθ only up to the

normalizing constant (which, in theory we could obtain by laborious integration)

7.4. MAXIMUM LIKELIHOOD ESTIMATION 367

and wish to estimate θ by maximum likelihood without evaluating this scaling

constant. To be more speci¬c, suppose that

hθ (x)

(7.39)

fθ (x) =

c(θ)

where the function hθ (x) is known and relatively easily evaluated but c(θ) is

unknown. There are many examples where probability density functions take

a form such as this, but perhaps the most common is in a problem involving

conditioning. For example suppose that we know the joint probability density

function of two random variables fθ (x, y) and we wish to condition on the value

of one, say Y. Then the conditional probability density function is

fθ (x, y)

fθ (x|y) = R

fθ (z, y)dz

R

which is of the form (7.39) with c(θ) replaced by fθ (z, y)dz.

There are several methods available for generating a random sample from

the density fθ without using the constant c(θ) including acceptance-rejection

and the Metropolis-Hastings algorithm. Notice that the likelihood for a sample

of size n will take the form

n

X

ln hθ (xi ) ’ nEθ [ln hθ (X)] (7.40)

ln L(θ) =

i=1

P

and so the constant is used to simply center the the estimating function n ‚θ ln hθ (xi )

‚

i=1

at its expected value. If we replace fθ by a density with respect to some other

value of the parameter or some completely di¬erent density function g(x), then

the maximizer of (7.40) satis¬es

n

X‚

(7.41)

0= ln fθ (xi )

‚θ

i=1

n

X‚ ‚

ln hθ (xi ) ’ nEθ [ ln hθ (X)] or (7.42)

=

‚θ ‚θ

i=1

n

1 X ‚θ hθ (xi )

‚ ‚

‚θ hθ (X)

(7.43)

= ln E[ ]

n i=1 hθ (xi ) g(X)

where the expected value in the second term is under the density g(x) for X.

Geyer(1996) suggests using this as the likelihood equation but with the expected

value replaced by an average over Monte-Carlo simulations. In other words if

we use N simulations of Xj generated under the density g(x), we would solve

for θ the estimating equation

n N

1 X ‚θ hθ (xi ) 1 X hθ (Xj )

‚

(7.44)

= ln[ ].

n i=1 hθ (xi ) N j=1 g(Xj )

368 ESTIMATION AND CALIBRATION

This gives rise to several additional questions. How is the asymptotic normal

distribution of the maximum likelihood estimator e¬ected by the introduction

of the Monte Carlo estimator of

‚

‚θ hθ (X)

ln E[ ],

g(X)

and what sort of importance distribution g(X) leads to the most e¬cient esti-

mator of θ? For the ¬rst question, Geyer (1995) provides an estimator of the

additional variance or covariance (over and above the usual covariance of a maxi-

mum likelihood estimator) introduced by the Monte Carlo estimation, and some

discussion on the choice of importance distribution. The considerations around

the choice of g(x) are much the same here as in any application of importance

sampling; we must avoid values of g(x) which are too small and lead to large

‚

‚θ hθ (X)

tails in the distribution of . Mixtures are suggested, for example

g(X)

k

X hθj (x)

g(x) = pj

c(θj )

j=1

for various choices of the parameter values θ1 , ...θk and accompanying weights

p1 + ... + pk = 1. Unfortunately this introduces the additional parameters

p1 , ..., pk which also need estimation, so it seems best to be parsimonious in

our choice of k.

The rather restrictive condition that

hθ (x)

fθ (x) =

c(θ)

for known function hθ (x) can be relaxed considerably above. In fact its primary

purpose is to lead to the estimating function

‚

‚θ hθ (xi )

ψ(xi , θ) =

hθ (xi )

which was centered at its expectation using Monte Carlo and then added over all

i to provide the estimating function for θ. The only purpose of the assumption

(7.39) was to motivate a speci¬c choice of function ψ. If we know virtually

nothing about fθ (x), but have the ability to generate random variables Xi (θ)

for any given parameter value, we might well begin with a number of candidate

estimating functions like ψ1 (x, θ), ....ψk (x, θ), center them with Monte Carlo as

before, and then determine the weighted average which results in the minimum

possible asymptotic variance. When we do this, of course, we are not using

the exact likelihood function (presumably this is unavailable), but since we are

implicitly estimating the score function with a function of the form

X

βi (θ)[ψi (x, θ) ’ µi (θ)],

i

7.4. MAXIMUM LIKELIHOOD ESTIMATION 369

we are essentially approximating the likelihood with an exponential family dis-

tribution X

c(θ) exp{ Bi (θ)Ψi (x, θ)}

i

‚

with the functions Ψi either given by ψi or by ‚θ ψi . Suppose we approximate

the expectation

µi (θ) = Eθ [ψi (X(θ), θ)]

using m simulations with

m

1X

b

µi (θ) = ψi (Xj (θ), θ).

m j=1

These same simulations allow us to estimate the covariance matrix Cij (θ) =

covθ (ψi (X, θ), ψj (X, θ)) as well as a di¬erence of the form where h is small.

m

1X

b [ψi (Xj (θ + h), θ) ’ ψi (Xj (θ ’ h), θ)]

δi (θ) =

2mh j=1

(here it is important that we use common random numbers). Then the optimal

estimating function constructed as a linear combination of the functions ψi is

X X

[ψj (xi , θ) ’ µj ]

b

βj (θ)

j i

where the vector of values of βj is estimated by

b b

β T (θ) = δ T (θ)C ’1 (θ). (7.45)

In order assess the feasibility of this method, we consider a simple example,

the symmetric stable laws whose density functions are quite di¬cult to evaluate,

but for which it is easy to generate random variables. Suppose X1 , ..., Xn are as-

sumed independent identically distributed with symmetric stable distributions,

parameters µ = θ, with parameters c = 1 and ± assumed known. Consider

M ’estimating functions of the form

X

φk (xi ’ θ), k = 1, 2, ...4

ψk (x, θ) =

i

where the candidate functions φk are

k k

φk (x) = max(min(x, ), ’ ), k = 1, ...4.

2 2

Although we know that these functions satisfy

Eθ [φk (xi ’ θ)] = 0

370 ESTIMATION AND CALIBRATION

for members of the symmetric stable family and therefore the functions ψk (x, θ)

are unbiased, we do not wish to use that knowledge so we will continue to use

b

Monte Carlo to generate the estimators µk (θ). What combination of these four

estimating functions provides the best estimator of θ? If we ¬x the value c = 1,

we have seen that we can generate X(θ) using

¸ ± ’1

1

cos(U (1 ’ ±))

(cos U )’1/± (7.46)

X(θ) = θ + sin(±U )

E

with U1 uniform[’π/2, π/2] and E standard exponential, independent of U . A

single pair (U, E) permits generating X(θ) for any θ. In this case, estimation of

X

‚

φ0 (xi ’ θ)

δk (θ) = ψk (x, θ) = k

‚θ i

is a little easier since

X X

φ0 (xi ’ θ) = I(|xi ’ θ| < k).

k

i i

We tested this routine on data the data below, with n = 21, generated from

a symmetric stable distribution with parameters (0, 1, 1.5).

-3.1890 -2.0491 -1.8185 -1.7309 -1.5403 -1.3090 -1.2752

-0.8266 -0.7154 -0.2706 -0.2646 -0.0478 0.2827 0.9552

1.0775 1.1009 1.6693 1.6752 2.4415 2.4703 5.5746

Table 7.5: Data from the symmetric stable(0, 1, 1.5) distribution

The mean of these values is 0.105 and the median is ’0.2646. By running

1000 simulations with θ updated 100 times using a Robbins-Monro procedure

and beginning with θ estimated by its median, we obtained an estimator of the

parameter

b

θ = ’0.0954

with coe¬cients on the four estimating functions given by

β = (0.3561 0.6554 0.8705 1.0096).

Evidently the last two functions ψ3 (x, θ), ψ4 (x, θ) receive more weight than the

¬rst two. The resulting estimator is, of course, not really the maximum likeli-

hood estimator, but it approximates the same to the extent that the functions

R

φk and φk can be used to approximate the density function. Convergence of

a Robbins-Monro algorithm here, as is frequently the case, is quite slow. To

some extent this is a necessary payment made for our assumed ignorance about

the form of the distribution and the expected value of the estimating functions.

We are repeatedly approximating the latter using Monte Carlo and the noise so

introduced results in the slow convergence. Here, as elsewhere, there is a price

to pay for ignorance.

7.5. USING HISTORICAL DATA TO ESTIMATE THE PARAMETERS IN DIFFUSION MODELS.371

There are certainly more e¬cient estimators of the median of a symmetric

stable distribution, but what is notable about this method is that it provides

a “black-box” estimation of parameters requiring no knowledge of the distribu-

tion beyond the ability to simulate values with a particular parameter value and

a set of vaguely reasonable estimating functions. Arguably, a fairly broad set

of candidate estimating functions together with massive computer power may

ultimately generate Monte Carlo estimators that rival maximum-likelihood es-

timation in e¬ciency without any knowledge of the structure of the distribution

beyond the ability to generate samples from it.

7.5 Using Historical Data to estimate the para-

meters in Di¬usion Models.

7.5.1 General Ito Processes

Typically a di¬usion model for ¬nancial data includes parameters with unknown

values which require estimation. We might, for example, wish to ¬t a di¬usion

model like

dr(t) = µ(t, r(t))dt + σ(t, r(t))dWt

to historical data on spot interest rates. Here as elsewhere we assume that

the drift and di¬usion coe¬cients are such that a solution exists. There are

many choices of the drift and di¬usion coe¬cients µ(t, r(t)), σ(t, r(t)) leading

to common models such as the Vasicek (1977), the CIR (Cox, Ingersoll, Ross,

1985), the Ho-Lee (Ho and Lee, 1986) model etc. but they all have unspeci¬ed

parameters θ that must be either estimated or ¬t to price data before they can

be used. Suppose for the present our intended use is not the pricing of bonds or

interest rate derivatives, but using the model to simulate interest rate scenarios

for an insurance company. In this case it is the P -measure that is our interest

and so historical values of r(t) are directly relevant. Unknown parameters θ

may lie in either the drift term µ(t, r(t)) or in the di¬usion σ(t, r(t)) so we will

add the argument θ to either or both function as required. There is a great deal

of literature dealing with estimation of parameters in a di¬usion model. We will

largely follow McLeish and Kolkiewicz (1997).

According to the simplest discrete time approximation to the process, the

Euler Scheme, the increments in the process over small intervals of time are

approximately conditionally independent and normally distributed. Suppose

we have already simulated the spot interest rate for integer multiples of ∆t

up to time t = j∆t. Provided the time increment ∆t is small, if we denote

∆r = r(t + ∆t) ’ r(t), we can generate the interest rate at time t + ∆t using

the approximation:

∆r ’ µ(t, r(t), θ)∆t ∼ N (0, σ 2 (t, r(t), θ))∆t). (7.47)

Thus, if θ is a parameter in the drift term only, it can be estimated using

weighted least squares; i.e. by minimizing the sum of the squared standardized

372 ESTIMATION AND CALIBRATION

normal variates. X

wt (∆r ’ µ(t, r(t), θ)∆t)2

minθ

t

where the weights wt are proportional to the reciprocal of the variances w =

1/σ 2 (t, r(t)). The solution to this is obtained by setting the derivative with

respect to θ equal to zero and solving for θ :

X ‚µ(t, r(t), θ)

(∆r ’ µ(t, r(t), θ)∆t) = 0. (7.48)

wt

‚θ

t

and it is not hard to see that this is the maximum likelihood estimator of θ as-

suming the normal approximation in (7.47) is exact. If the unknown parameter

θ is in the di¬usion term only, or in both drift and di¬usion term, it is also easy

to see that the log likelihood for the normal approximation is

1 X (∆r ’ µ(t, r(t), θ)∆t)2

1

’n ln(σ(t, r(t), θ)) ’

σ 2 (t, r(t), θ)

2 2t

with n the number of time points t at which we observed the spot interest rate.

Maximizing this over the parameter θ results in the estimating function

X ‚µ(t, r(t), θ) (∆r ’ µ(t, r(t), θ)∆t) ‚ ln[σ 2 (t, r(t), θ)] (∆r ’ µ(t, r(t), θ)∆t)2

{ ’1} = 0

+

σ 2 (t, r(t), θ) 2σ 2 (t, r(t), θ)

‚θ ‚θ

t

which combines the estimating function for the drift (7.48) with that for the

di¬usion term, using weights inversely proportional to their variances.

Girsanov™s Theorem allows us to construct maximum likelihood estimators

also for the continuously observed processes analogous to some of the estimators

above. For example, suppose the parameter θ resides in the drift term only, so

the model is of the form

dXt = µ(t, Xt , θ)dt + σ(t, Xt )dWt , Xo = x0

Suppose P is the measure on C[0, T ] induced by Xt with X0 = 0 and the

measure P0 on the same space is induced by a similar equation but with zero

drift:

(7.49)

dXt = σ(t, Xt )dWt , X0 = x0 .

Then Girsanov™s Theorem (see the appendix) asserts that with

Zt Zt Z

1 t µ2 (t, Xt , θ)

µ(s, Xs , θ) µ(s, Xs , θ)

Mt = E( dXs ’

dWs ) = exp{ ds},

2 2 0 σ 2 (t, Xt )

σ(s, Xs ) 0 σ (s, Xs )

0

(7.50)

under some boundedness conditions, then the Radon Nikodym derivative of P

with respect to P0 is given by MT ,

dP

= MT .

dP0

7.5. USING HISTORICAL DATA TO ESTIMATE THE PARAMETERS IN DIFFUSION MODELS.373

The exponential martingale

Z t

µ(s, Xs , θ)

Mt = E( dWs )

σ(s, Xs )

0

describes the Radon-Nikodym derivative for the processes restricted to [0, t]

and the process Xs appearing in (7.50) is assumed generated from the relation

(7.49). Thus the maximum likelihood estimate of θ is obtained by maximizing

MT . Setting the derivative of its logarithm equal to 0 results in the likelihood

equation

Z Z

‚µ(t, Xt , θ) 1 ‚µ(t, Xt , θ) 1

dXt ’ µ(t, Xt , θ)dt = 0

σ 2 (t, Xt ) σ 2 (t, Xt )

‚θ ‚θ

Z

‚µ(t, Xt , θ) 1

(dXt ’ µ(t, Xt , θ)dt) = 0

or

σ 2 (t, Xt )

‚θ

Of course if we only had available observations taken at discrete time points

t1 < t2 < . . . rather than the continuously observed process, we would need

to replace the integral by a sum resulting in the same estimating function as

(7.48), namely

X ‚µ(ti , Xti , θ)

σ’2 (ti , Xti ) (∆Xti ’ µ(ti , Xti , θ)∆ti ) = 0. (7.51)

‚θ

ti

For a continuous time model of the form

dXt = µ(t, Xt , θ)dt + σ(t, Xt , θ)dWt

in which unknown parameters θ are in both the drift and the di¬usion part of

the stochastic di¬erential, estimation of the parameter θ is very di¬erent than in

the discrete time case. Consider, for example, solving the estimating equation

X

{(∆Xti ’ µ(ti , Xti , θ)∆ti )2 ’ σ 2 (ti , Xti , θ)∆ti } = 0 (7.52)

ψ(θ) =

i

to obtain an estimator of θ. Since for su¬ciently small time increments ∆ti the

Euler increments are approximately normal, this estimating function is asymp-

totically unbiased as max{∆ti } ’ 0. Moreover, for any normal random variable

Z with mean 0 var(Z 2 ) = 2(var(Z))2 so the variance of the estimating function

above is, in the limit,

X

σ 4 (ti , Xti , θ)(∆t)2

var(ψ(θ)) ’ 2

i

X

σ4 (ti , Xti , θ)(∆ti )

· max{∆ti }

i

i

’0

374 ESTIMATION AND CALIBRATION

P

σ4 (ti , Xti , θ)(∆ti ) converges to the integral

the convergence to zero since i

Z T

σ 4 (t, Xt , θ)dt

0

which is ¬nite, provided for example that the function σ(t, x, θ) is bounded.

In general the asymptotic variance of the estimator obtained by solving an

equation of the form (7.52) is, under fairly modest regularity assumptions on

the estimating function ψ, asymptotic to

var(ψ(θ))

(7.53)

.

‚

E 2 [ ‚θ ψ(θ)]

‚

It is easy to see that E[ ‚θ ψ(θ)]is asymptotic to

Z

‚σ(t, Xt , θ)

’2 σ(t, Xt , θ)dt

‚θ

which is typically non-zero if σ depends on θ. Then (7.53) approaches 0 since

the denominator is asymptotically nonzero. We conclude that the estimator

of θ determined by the estimating function (7.52) approaches perfection (i.e.

its variance approaches zero) as ∆t ’ 0 and this it true regardless of how long

or how short the time interval is over which the process has been observed.

Restated, this says that for a continuously observed process, a parameter in

the di¬usion term (including the volatility parameter) can be perfectly estimated

from an arbitrarily short period of observation. Now if you think this is too good

to be true, you are right, at least in a practical sense. In practice, volatility is

by no means known or estimated precisely, because the di¬usion model does not

¬t on the minute time scale necessary for the above argument to go through.

Far from having nearly perfect knowledge of the volatility parameters, volatility

appears to change rapidly on any time scale for which the di¬usion model is con-

sidered a good ¬t. In the next section we discuss various estimates of volatility

obtained from a discretely observed time (geometric) Brownian motion.

7.6 Estimating Volatility

In the last section, we saw that those parameters in the di¬usion coe¬cient of

a general di¬usion were all too easily estimated. In fact from an arbitrarily

short segment of the path, we can, at least in theory, obtain an estimator of

the volatility which is exact (is unbiased and has zero variance) because of the

in¬nite variation of a di¬usion process in these arbitrarily short periods of time.

Information for estimating the di¬usion coe¬cient obtains much more rapidly

than for the drift, and in this respect the continuous time processes are quite

di¬erent than their discrete analogues. Two di¬usions processes with di¬erent

di¬usion coe¬cients are mutually singular. Intuitively this means that the sam-

ple paths generated by these two measure lie in two disjoint subsets of C[0, T ]

7.6. ESTIMATING VOLATILITY 375

and so that we can theoretically determine from a single sample path the exact

di¬usion term. Of course to determine which set a path lies in, we need to

estimate or examine the quadratic variation of the process, something which

requires a perfectly graphed sample path and an in¬nitely powerful microscope.

The real world is considerably di¬erent than this idealized situation for several

reasons. First, we never observe a process in continuous time but only at dis-

crete time points that may be close together. Second, di¬usion processes ¬t

data on security prices, interest rates and exchange rates only when viewed over

a longer time intervals than a minute, hour, or day. Short-term behaviour is

very di¬erent; for example they usually evolve through a series of jumps corre-

sponding to trades of varying magnitudes and frequencies. Third, information

obtained from the derivatives market can be used to obtain implied volatili-

ties but these are not the same as theP -measure or historical volatilities in a

discrete-time model. In theory, they should agree in the continuous time Black-

Scholes model, since the risk neutral measure has the same di¬usion coe¬cient

as does the P measure, but this is of limited use since volatilities can change

rapidly.

The volatility parameter is the single most important parameter for pricing

short-term options and arguably the most important parameter for virtually all

¬nancial time series. In Figure 7.4 we plot the price of a call option on a stock

currently trading at $10. The strike price of the stock is $12 and the time to

maturity is 1/2 year. Notice that the option price ranges from close to 0 when

σ = 0.1 to around $1.40 when σ = 0.7 indicating that di¬erent estimates

of volatility will have substantial impact on the option price. In fact over a

large part of the range of this graph the relationship between option price and

volatility is nearly linear.

Complicating the estimation of σ is its time-varying behaviour, with periods

of persistent high volatility followed by periods of relatively lower volatility. Fail-

ure to accommodate the behaviour of the volatility process may result in highly

misleading models, substantial pricing errors, and the catastrophic consequences

that can accompany them. We have already distinguished between estimation

of volatility based on historical data and the calibration of the volatility para-

meter to option price data. The latter is required if we are pricing derivatives

since it chooses a value of the volatility parameter that is most consistent with

the observed prices of options and futures. The former is necessary if we are

interested in volatility for the purpose prediction, risk management, or scenario

generation. The implied volatility is obtained from the price of one or more

heavily-traded or benchmark derivatives sold on the open market having the

same price process St for the underlying asset. We determine the volatility

parameter which produces the market price of a given option. For example sup-

pose an option with strike price K, maturity T, and initial value of the stock S0

is traded on the market at a price given by V. Then, because the Black-Scholes

price is increasing in σ, we may solve the equation

(7.54)

BS(S0 , K, r, T, σ) = V

for the implied volatility parameter σ. Because BS(S0 , K, r, T, σ) is monotone

376 ESTIMATION AND CALIBRATION

Figure 7.4: Price of Call option as a function of volatility, S0 = 10, K = 12, r =

0.05, T = 0.5 years.

in σ a solution exists (as long is V is within a reasonable range) and is unique.

This estimate of volatility di¬ers from the historical volatility obtained by com-

puting the sample variance of the returns log(St+1 /St ). Since it agrees with the

market price of a speci¬c option, it re¬‚ects more closely the risk-neutral distri-

bution Q and is therefore used for pricing other derivatives. There are several

disadvantages of this method of calibrating the parameter. The calibrated value

of σ depends on the stock, the strike price of the option, as well as the current

time t, the time to maturity T ’ t and to some extent on other parameters

that are harder to measure such as the liquidity of the option and the degree to

which it is used as a benchmark. Nevertheless it is common to transform option

prices V to volatility σ using (7.54) and then rather than deal with option prices,

analyze these implied volatilities. This transformation may be justi¬ed if the re-

sulting model is simpler or smoother as a function of σ. However there are many

monotonic transformations of price that could do a better job of simplifying a

non-Gaussian model.

In this section we will concentrate on the estimation of volatility based on

historical data, as opposed to calibration. An estimator based on historical

data approximates a parameter of the real world measure P whereas the cali-

brated parameter is a property of the risk-neutral measure Q. To explore the

di¬erences between implied and historical volatility, we downloaded two years

of daily stock price data (Nov 1, 1998 to Nov 1, 2000) for Nortel (NT), listed

on the New York Stock Exchange from the web-site http://¬nance.yahoo.com

and on the basis of this, wish to estimate the volatility. We used the “adjusted

7.6. ESTIMATING VOLATILITY 377

Figure 7.5: Nortel Returns, Nov 1, 1998-Nov 1, 2000.

close” since these have been adjusted for stock splits and dividends. The daily

returns over this period are graphed in Figure 7.5 and the signi¬cant increase

in volatility is evident towards the end of the sequence. Since the logarithm

of daily stock prices is assumed to be a Brownian motion we may estimate the

daily volatility using the sample variance of the ¬rst di¬erences in these loga-

rithms. To obtain the variance over a year, multiply by the number of trading

days (around 253) in these years. Thus the annual volatility is estimated by

sqrt(253*var(di¬(log(ntprice)))) which gives a volatility of around 0.635.

How does this historical estimate of volatility compare with the volatility as

determined by option prices?

Table 7.5 is a segment of a table of option prices obtained from the Chicago

Board of Options Exchange and provides the current price (on November 2,

2000) of calls and puts on NT. There was a January $40 call selling for $8 5 8

and a $40 January put for $4 1 . The price of the stock when these options prices

8

were recorded was $44 (evidently the “good old days”!)

Calls Last Sale Net Bid Ask Volume Open Interest

01 Jan 40 (NT AH-E) 8 5/8 -7/8 8 3/8 8 7/8 79 10198

01 Jan 45 (NT-AI-E) 6 1/8 -1/4 5 7/8 6 1/4 460 4093

Puts Last Sale Net Bid Ask

01 Jan 40 (NT MH-E) 4 1/8 +1/4 3 7/8 4 1/4

01 Jan 45 (NT-MI-E) 6 3/8 +1/4 6 1/8 6 5/8

Table 7.5: Option Prices for Nortel, Nov 2, 2000.

Suppose the current interest rate (in US dollars since these are US dollar

prices) is 5.8%. This is roughly the interest rate on a short term risk free deposit

like a treasury bill. Then the implied volatility is determined by ¬nding the value

378 ESTIMATION AND CALIBRATION

of the parameter σ (it turns out to be around 0.79) so that the Black-Scholes

formula gives exactly this value for an option price, i.e. ¬nding a value of σ so

that PUT=4.125 and CALL=8.625, for example:

[call,put]=blsprice(44,40,.058,55/253,.79,0) ( returns call =8.6077, put =4.1066)

The implied volatility of a roughly at the money option is about σ = 0.79

and this roughly agrees with the price of the January 45 options

[call,put]=blsprice(44,45,.058,55/253,.79,0) (returns call =6.2453, put =6.6815)

Di¬erences between the observed and theoretical option prices of the mag-

nitude observed here are inevitable. Transaction cost and the fact that some

options are more liquid than others can a¬ect option prices. If we ignore this

problem, we can ¬nd a pair of values (r, σ) by minimizing the sum of squared

pricing error between the observed and Black-Scholes price of the put and call

options, but because short-term option prices are quite insensitive to r, the

estimates of r determined in this way can be unreliable.

We now have two estimates of volatility, historical volatility (0.635) and

implied volatility (0.79). Which volatility is “correct”? The answer is both.

The market conditions for this company changed enough over this period and

subsequently to e¬ect enormous changes in volatility and if we used a period of

less than 2 years, our estimate of historical volatility would be larger (the one-

year estimated volatility is 0.78, very close to the implied volatility). Moreover

the implied volatility is a property of the Q measure, and this is determined not

only by the recent history of the stock, but also investor risk preferences and

fears. The Q measure is a distribution for stock prices assigned by the market

for options. The P -measure was assigned by investor™s past tastes for the stock,

sometimes in a substantially di¬erent economic climate. It is often the case that

“implied volatilities” are greater than the historical ones, although in theory for

a Geometric Brownian motion process which allows frictionless continuous-time

hedging, the risk-neutral Q and the actual distribution P should have the same

volatilities.

7.6.1 Using Highs and Lows to estimate Volatility

There are many competing estimators of the volatility parameter of a geometric

Brownian motion and to date we have used only one, the sample variance of

stock returns. To set the stage for these estimators, suppose S(t) is a geometric

Brownian motion

d ln(S(t)) = µdt + σdWt , S(0) = S0

and we observe certain properties of non-overlapping segments of this process,

for example days or weeks. In particular suppose t1 < t2 < tn .. are the begin-

nings of certain observation periods of length ∆i where ti +∆i · ti+1 and so we

can de¬ne, for this period, the open, the close, the high and the low respectively

Oi = S(ti ), Ci = S(ti + ∆i )

Hi = max{S(t); ti · t · ti + ∆i },

Li = min{S(t); ti · t · ti + ∆i }.

7.6. ESTIMATING VOLATILITY 379

The observation periods may or may not be assumed equal in length (for example

1

daily in which case ∆i is around 252 for all i). We will sometimes assume that

the market does not move outside of these observation periods, in which case

Oi = Ci’1 since in practice the open of many stocks at the beginning of the day

is nearly identical to the close at the end of the previous day.

The simplest estimator of σ is that based on the variance of returns

var(Ri ), where Ri = ln(Ci /Oi ).

Since Ri is Normal(µ∆i , σ 2 ∆i ), we can construct a regression problem towards

estimating the parameters µ, σ

p

R

√ i = µ ∆i + µi , where µi are i.i.d. N (0, σ2 ).

∆i

By regarding the left side as the response in an ordinary least squares re-

gression, we obtain the standard estimators

Pn

i=1 Ri

µ = Pn

b (7.55)

∆i

i=1

n

1 X1

2

(Ri ’ µ∆i )2 ,

b b (7.56)

σOC =

n ’ 1 i=1 ∆i

both unbiased estimators of their respective parameters. Regression theory also

provides the variance of these estimators:

σ2

var(b) = Pn

µ ,

i=1 ∆i

2σ 4

σ2

var(bOC ) = .

n’1

b2

There are more e¬cient estimators than σOC if we include not only an open

and a close but also the high and the low for those periods. In Chapter 5,

we discovered that for a Geometric Brownian motion, the random variables

ZHi = log(Hi /Oi ) log(Hi /Ci ) and ZLi = log(Li /Oi ) log(Li /Ci ) are both expo-

nentially distributed with expected value σ 2 ∆i /2 and each of ZLi and ZHi are

independent of the open Oi and the close Ci . This auxiliary information can

be used in addition to obtain an unbiased estimator of the parameter σ2 . For

example consider the estimator

n

2 X ZHi

b2

σHH =

n i=1 ∆i

which, in view of the exponential distribution of ZH is unbiased and has variance

σ4

σ2

var(bHH ) = ,

n

380 ESTIMATION AND CALIBRATION

about half of the variance based on returns. Of course we can build a similar

estimator using the lows,

n

2 X ZLi

2

b

σLL = .

n i=1 ∆i

b2 b2

One might guess that an estimator which simply averages σHH and σLL would

be four times as good as (7.56) but in fact because of the negative correlation

between the two estimators being averaged it is better still! The average is the

Rogers and Satchell (1991) estimator

n

1 X ZHi + ZLi

b2 (7.57)

σRS =

n i=1 ∆i

which has variance depending on the correlation coe¬cient ‚ ' ’0.338 between

ZHi and ZLi

2 ZHi

σ2

var(bRS ) = var( )(1 + ‚)

n ∆i

σ4

' (1 ’ 0.338)

2n

σ4

' 0.331 .

n

Evidently the Rogers and Satchell estimator is around 6 times as e¬cient (one

sixth the variance) as the usual estimator or volatility (7.56). In fact it is

independent of (7.56) as well and so we can combine the two with weights

inversely proportional to the estimator variances to get a best linear unbiased

estimator with weights rounded,

12 62

b2

σBLU ' b b

σOC + σRS

7 7

and this estimator has variance

σ4

σ2 ≈ 0.284

var(ˆBLU )

n

around one seventh of (7.56). Using a single high, low, open, close for a seven day

period to estimate volatility is roughly equivalent to using daily returns and the

estimator (7.56). Related estimators have been suggested in the literature. See

for example Parkinson(1980). Garman and Klass (1980) suggest the estimator

1

ˆ2 (log(Hi /Li ))2 ’ (2 ln(2) ’ 1)(log(Ci /Oi ))2 (7.58)

σGK =

2

b2

which is similar to σBLU .

We could quibble over which of the above estimators to use, based on extra-

ordinarily precise e¬ciency calculations for a geometric Brownian motion but

7.6. ESTIMATING VOLATILITY 381

much of this exercise would be wasted. Most indeed improve e¬ciency over

(7.56) but not quite to the extent predicted by theory and provided that we use

the extremes, the di¬erences between those estimators which do use them are

relatively small. To demonstrate this we compared them on the Dow Jones In-

dustrial average for the period January 1999 to May 20, 2004. We computed all

of the estimators for daily data on the open, close, high, and low and then plot-

ted a 21-day moving average of these estimators against time. For example two

b b

of the estimators, σOC and σHH are given in Figure 7.8. The other estimators

are not plotted since they closely resemble one of these two. The curve labelled

b

“intraday volatility” measures the annualized volatility as determined by σHH

b

and that labelled “interday volatility”, σOC . The estimators when evaluated on

the whole data set provide values as follows:

b b b b b b

σ OC = 0.18 σHH = 0.29 σLL = 0.31 σRS = 0.30 σBLU = 0.29 σGK = 0.31

b

indicating only minor di¬erences between all of the estimators except σ OC .

Over this period January 1999 to May 20, 2004, the intraday volatility was

consistently greater than the interday volatility, often by 50% or more, and there

is a high correlation among the estimators that use the highs and lows as well as

the open and close. This indicates a profound failure in the Geometric Brownian

motion assumption for the Dow Jones Industrial index, and this discrepancy

between two estimators of volatility is much greater than for any other index

I have investigated. Since we have seen substantial di¬erences in the variances

of the estimators based on the geometric Brownian motion model, it is worth

comparing the estimators for their empirical variance. However since the ¬rst,

b

σ OC appears to have a di¬erent mean, we will compare them after normalizing

by their mean: i.e. we compare values of

sqrt(var(b2 ))

σ

CV =

E(b2 ))

σ

as estimated from the 21-day moving averages. These values are given below

CVOC = 0.83 CVHH = 0.53 CVLL = 0.59 CVRS = 0.54 CVBLU = 0.55 CVGK = 0.55

and again, except for the ¬rst, they are virtually indistinguishable. What hap-

b b

pened to the gains in e¬ciency we expected when we went from σHH to σBLU ?

These gains are, in part at least, due to the negative correlation ’0.338 between

ZHi and ZLi but for this data, they are positively correlated and in fact the

21-day moving averages of ZHi and ZLi have correlation around 0.90. The

only explanation that seems reasonable is that the DJA is quite far from a Geo-

metric Brownian motion. If the volatility were stochastic or time-dependent,

as most believe it is, then this can account for the apparently high correlation

between ZHi and ZLi . For example suppose that the volatility parameter σi

was stochastic (but constant throughout the day) and possibly dependent on i.

382 ESTIMATION AND CALIBRATION

Figure 7.6: The construction of an Eggtimer plot

Then since

cov(ZHi , ZLi ) = E[cov(ZHi , ZLi |σi )] + cov(E[ZLi |σi ], E[ZLi |σi ])

2

σi ∆i

= E[cov(ZHi , ZLi |σi )] + var( )

2

it is possible for the ¬rst term to be negative (as predicted by a geometric

Brownian motion model) and yet the second term is su¬ciently large that the

covariance itself is large and positive.

Thus we have three estimators of the volatility parameter,

ln(C/O)}2 2 ln(L/O) ln(L/C) 2 ln(L/O) ln(L/C)

{ }.

, ,

∆t ∆t ∆t

While the ¬rst is independent of the other two given O, unfortunately the second

and third are themselves not uncorrelated. In order to weight them optimally

we need some information about their joint distribution. It follows that both

{ln(C/O)}2 /∆t and (ZH + ZL )/∆t provide unbiased estimators of the volatility

parameter σ 2 and indeed the latter is independent of the former.

These estimators are areas illustrated in Figure 7.6. Consider the plot cor-

responding to time t. The vertical scale is logarithmic so that logs are plotted.

This plot is constructed using an arbitrarily chosen angle θ from the four values

(O, C, H, L) using two lines `1 , `2 through the point (t, 1 ( ln(O)+ ln(C))) with

2

7.6. ESTIMATING VOLATILITY 383

Figure 7.7: Eggtimer Plot for the Dow Jones Average, Feb 1-March 27, 2000.

slopes ±tan(θ). Horizontal lines are drawn at the ordinate values ln(H), ln(L),

ln(O), ln(C) and using the points where ln(O)and log(C) strike the two lines as

corners, a rectangle is constructed. The area of this rectangle tan(θ)( ln(C/O))2

is an unbiased estimator of tan(θ)σ 2 ∆t provided the Brownian motion has no

drift. The second region consists of “wings” generated by the four points at

which the horizontal line at ln(H), ln(L) strike the lines `1 , `2 . The total area of

this region (both wings) is tan(θ)(ZL +ZH ) which is another unbiased estimator

of tan(θ)σ 2 T independent of the ¬rst, and also independent of whether or not

the underlying Brownian motion has drift. By comparing these areas, we can

detect abnormal changes in the volatility, or changes in the drift of the process

that will increase the observed value of ( ln(C/O))2 while leaving the second

estimator unchanged. Because each estimator is based only on a single period,

it is useful to provide as well a plot indicating whether there is a persistent

change in either or both of the two estimators of volatility. Related estimators

have been suggested in the literature.

We also show empirically the e¬ectiveness of incorporating the high low close

information in a measure of volatility. For example, the plot below gives the

eggtimer plot for the Dow Jones Industrial Index for the months of February

and March 2000. The vertical scale is logarithmic since the Black Scholes model

is such that the logarithm of the index is Brownian motion. A preponderance

of black rectangles shows periods when the drift dominates, whereas where the

grey tails are larger, the volatility is evidenced more by large values of the high

or small values of the low, compared to the daily change.

The 21-day rolling sum of the areas of the regions, either grey or black,

is graphed in Figure 7.8 and provides two measures of volatility. The rolling

sum of the grey areas is called the “intraday volatility” and that of the black

384 ESTIMATION AND CALIBRATION

Figure 7.8: 21-day average volatility estimates for the Dow Jones Average, 1999-

May 20, 2004

rectangles, the interday volatility. In the absence of substantial drift, both

measure the same theoretical quantity but they di¬er in this case.

This di¬erence is equally evident from Figure 7.9, the plot of the cumulative

variance for the same period of time. Of course in this plot, it is the slope not

the level which indicates the variance.

Consistent di¬erences between the intra-day and the inter-day volatility or

variances would be easy to explain if the situation were reversed and the inter-

day volatility were greater, because one could argue that the inter-day measure

contains a component due to the drift of the process and over this period there

was a signi¬cant drift. A larger intraday volatility is more di¬cult to explain

unless it is a failure of the Black-Scholes model. In this case, one might expect

a similar behaviour in another market. If we generate a similar plot over the

identical period of time for the NASDAQ index (Figure 7.10) we ¬nd that the

comparison is reversed. This, of course, could be explained by the greater drift of

the technology dependent NASDAQ (relative to its volatility) compared to the

relatively traditional market of the Dow Jones but other indices we compared

were more like the NASDAQ here than like the DJA.

There is no doubt that this di¬erence re¬‚ects a greater intraday range for the

DJA than other indices. In fact if we plot the cumulative value of the range of

the index divided by the close (H ’ L)/C as in Figure 7.11, it con¬rms that the

daily range as measured by this ratio is consistently smaller for the NASDAQ

7.6. ESTIMATING VOLATILITY 385

Figure 7.9: Cumulative Variances of the Dow Jones Average

Figure 7.10: Cumulative Variances for the NASDAQ index

386 ESTIMATION AND CALIBRATION

Figure 7.11: Comparison of the cumulative variance of the Dow Jones and the

NASDAQ, Jan 1999-July 2000

than for the Dow Jones for this period.

Although high, low, open, close data is commonly available for many ¬nan-

cial time series, the quality of the recording is often doubtful. When we used

older data from the Toronto Stock Exchange, there were a number of days in

which the high or low were so far from open and close to be explicable only as

a recording error (often the di¬erence was almost exactly $10). When the data

on highs and lows is accurate, there is substantial improvement in e¬ciency

and additional information available by using it. But there is no guarantee

that published data is correct, particularly old data. A similar observation on

NYSE data is made by Wiggins (1991); “In terms of the CUPV data base itself,

there appear to be a number of cases where the recorded high or low prices are

signi¬cantly out of line relative to adjacent closing prices”.

7.7 Estimating Hedge ratios and Correlation Co-

e¬cients

The correlation coe¬cient between two asset prices is important not only be-

cause it indicates a degree of dependence between the two assets but because it

is a required component for many practical investment decisions such as optimal

portfolio selection, risk management, and hedging. There is no perfect hedge

(except apparently in a Japanese garden) and so in practice we are required to

use one asset to hedge another (hopefully highly correlated) asset. These may

include derivatives or similar investments (for example bonds on the same or

related underlying) which react similarly to market conditions.

7.7. ESTIMATING HEDGE RATIOS AND CORRELATION COEFFICIENTS387

Suppose we wish to hedge one investment, say in stock 2 using another, stock

1. As before, assume that we have data on the high, low, open and close of both

stocks over non-overlapping time intervals (ti , ti+∆i ), i = 1, 2, ..., n. Denote the

high, low, open and close for stock j on time interval i by (Hij , Lij , Oij , Cij ), i =

1, .., n , j = 1, 2. Again we will denote the returns by

Rij = ln(Cij /Oij ).

Then under the Geometric Brownian motion assumption, the return vector

2 2

(Ri1 , Ri2 ) has a bivariate normal distribution with variances σ1 ∆i , σ2 ∆i and

correlation coe¬cient ρ.

If we assume that we are able to rehedge our portfolio at the beginning of

each time interval, and at the beginning of interval i we are long 1 unit worth

of stock 2 and short hi units worth of stock 1, then our total return at the end

of the i™th period is Ri2 ’ hi Ri1 . The optimal hedge ratio is the value of hi

which minimizes the variance of Ri2 ’ hi Ri1 and this is given by

cov(Ri2 , Ri1 ) σ2

=ρ .

hi =

var(Ri1 ) σ1

Note that hi is a constant independent of time, at least over periods when the

stock volatilities and correlation coe¬cients remain unchanged. While implied

volatilities σ1 ,σ2 may be obtained from derivative prices for each of these assets,

the correlation parameter ρ is unknown and, unless there is a traded option such

as a spread option whose value depends speci¬cally on this correlation, ρ needs

to be estimated from historical data. The simplest estimator of the correlation

coe¬cient is the sample covariance of the returns,

P

(Ri2 ’ R2 )(Ri1 ’ R1 )

ρC = cor(Ri2 , Ri1 ) = qP i

c (7.59)

ˆ P

i (Ri2 ’ R2 ) i (Ri1 ’ R1 )

c

where cor denotes the sample correlation coe¬cient. By a common argument

( see for example Anderson (1958, Theorem 4.2.6) this has asymptotic variance

1

(1 ’ ρ2 )2 .

n

Here, as in McLeish(2004), we will consider using historical data for (Hij , Lij , Oij , Cij ), i =

1, .., n , j = 1, 2 to estimate ρ. In the case of two or more correlated geometric

Brownian motion processes, the joint distributions of highs, lows and closing

values is unknown, and so we will need to revert to a simpler alternative than

maximum likelihood estimation. We have seen that in the Black-Scholes model,

the statistics

ZHi1 = ln(Hi1 /Oi1 ) ln(Hi1 /Ci1 )

ZHi2 = ln(Hi2 /Oi2 ) ln(Hi2 /Ci2 )

ZLi1 = ln(Li1 /Oi1 ) ln(Li1 /Ci1 )

ZLi2 = ln(Li2 /Oi2 ) ln(Li2 /Ci2 )

388 ESTIMATION AND CALIBRATION

1.4

1/3

Distribution of Z

Normal approximation

1.2

1

0.8

0.6

0.4

0.2

0

0 0.5 1 1.5 2 2.5 3

Figure 7.12: The Normal approximation (dashed line) to the distribution of

Z 1/3 where Z is exponential.

all have marginal exponential distributions and each is independent of the open

and close for the i™th interval.

Suppose we transform each of the above exponential random variables with

some function g and assume that we can determine the correlation coe¬cients

cor(g(ZH1 ), g(ZH2 )) = b(ρ) as a function of ρ. For simplicity assume that we

have subtracted the mean and divided by the standard deviation to provide

a function g such that E{g(ZH1 )} = 0, var{g(ZH1 )} = 1. There are various

possibilities for the transformation g, the simplest being a standardized power

p

p p p p

g(ZH1 ) = (ZH1 ’ E(ZH1 ))/ var(ZH1 ) for some suitable value of p > 0. A

transformation of the gamma distributions in general and the exponential dis-

tribution in particular that make them very nearly normal is the cube root

transformation (p = 1/3) (see Sprott, 2000, Chapter 9, Appendix).

For an exponential random variable ZH there is only a slight di¬erence

between the graph of the probability density function of

1/3 1/3

ZH ’ E(ZH )

q (7.60)

1/3

var(ZH )

and the graph of the standard normal probability density function, this di¬er-

ence lying in the left tail (see Figure 7.12), the region in which we are least

interested in the maximum since it is very close to the open or close. We re-

placed the theoretical mean and variance in (7.60) by the sample mean and

7.7. ESTIMATING HEDGE RATIOS AND CORRELATION COEFFICIENTS389

variance and so the function used ultimately was

1/3 1/3

ZHi ’ ZH

g(ZHi ) = q .

1/3

d

v ar(ZH )

Although it is theoretically possible to obtain the function b(ρ), the exact

expression requires the evaluation of Bessel functions. For simplicity we ¬t

polynomials of varying degrees. For example the polynomial of degree 9

b(ρ) ≈ .4822ρ9 +.2102ρ8 ’.9629ρ7 ’.3104ρ6 +.7006ρ5 +.1887ρ4 ’.1288ρ3 +.1822ρ2 +.6385ρ+.0008

is accurate to within 1%.

Inverting this approximation to b(ρ) provides a tractable estimator of the

correlation between stocks based only on the correlation between the marginally

exponential statistics ZHi , ZLi . This estimator is

1 1/3 1/3 1/3 1/3

ρHL = b’1 ( (cor(ZH1 , ZH2 ) + cor(ZL1 , ZL2 )))

c c (7.14)

ˆ

2

1/3 1/3

A similar estimator obtains as well from the cross terms since corρ (ZH1 , ZH2 ) =

b(’ρ).

1 1/3 1/3 1/3 1/3

ρ2 = ’b’1 ( (cor(ZH1 , ZL2 ) + cor(ZL1 , ZH2 )))

c c (7.15)

ˆ

2

c

where again cor denotes the sample correlation, but this estimator is very inef-

¬cient and adds little to our knowledge of the correlation. We will not consider

it further. Since all that is required is the inverse of the function b, we can

use a simple polynomial approximation to the inverse without sacri¬cing much

precision, for example

b’1 (c) ≈ 0.1567c5 ’ 0.5202c4 + 0.6393c3 ’ 0.8695c2 + 1.5941c.

The contenders for reasonable estimators are ρHL , ρC , or some combination such

ˆ ˆ

as an average, the simplest being a straight average of these two

1 1

ρAV =

ˆ ρC + ρHL .

ˆ ˆ

2 2

It remains to see how these estimators perform in practice both on real and

simulated data. First we simulated one year™s worth of daily observations on

two stocks, simulating the High, Low, Open, and Close for each day. There is

considerable bene¬t to using both of the estimators ρC and ρHL at least for

ˆ ˆ

simulated data. For example in Figure 7.13 we graph the two variances var(ˆC )

ρ

and var(ˆAV ) for various (positive) values of ρ. Notice that the estimator ρAV

ρ ˆ

which combines information from all of the high, low, open, close has variance

roughly one half of that of ρC which uses only the values of the open and the

ˆ

close.

390 ESTIMATION AND CALIBRATION

Figure 7.13: Variance of Estimators ρC and ρAV as a function of ρ.

ˆ ˆ

We also use these methods to investigate the correlation between two processes.

To this end we downloaded data for both the S&P500 index and Nortel (NT)

over the period June 17, 1998-June 16, 2003. This was a period when Nortel

rose and fell sharply showing increasing volatility. In Figure 7.14, we plot the

two measures of the correlation ρAV and ρC between the returns for these two

ˆ ˆ

series in moving windows of 63 days (approximately 3 months) over this period.

Note that the estimator ρAV , indicated by a dotted line, tends to give lower

ˆ

correlation than ρC since it includes a component due to the movements within

ˆ

days and these have lower correlation than the the movements between days.

7.8 Problems

1. Suppose Y1 and Y2 are independent normal random variables with pos-

2 2

sibly di¬erent variances σ1 , σ2 and expected values E(Yi ) = µi , i = 1, 2.

Show that the conditional distribution of Y1 given Y1 + Y2 = S is Normal

with mean

µ1 + w(S ’ µ1 ’ µ2 )

and variance

2

σ1 (1 ’ w)

where

2

σ1

w= 2 2.

σ1 + σ2

2. Consider data generated from the mixture density

2 2

pN (θ1 , σ1 ) + (1 ’ p)N (θ2 , σ2 )

7.8. PROBLEMS 391

0.8

ρ

C

ρ

AV

0.7

0.6

0.5

0.4

correlation

0.3

0.2

0.1

0

-0.1

1999 2000 2001 2002 2003

Figure 7.14: Two Measures of Correlation between Nortel (NT) and the S&P500

Index

where θ1 < θ2 (so that the parameters are identi¬able). Write a pro-