<<

. 2
( 3)



>>

n i=1
n
1X
c (7.35)
µy = xi
n i=1
n
1X
\ (xi ’ µx )(yi ’ µy ).
c c (7.36)
cov(X, Y ) =
n i=1

For small sample size the estimator (7.36) can be adjusted for bias by replacing
1 1
n by n’1 . The projection argument underlying the EM algorithm indicates
that the maximum likelihood estimating function for the incomplete data is
obtained by conditioning on the observed information. Projecting the three
equations above on the observed data corresponds to doing the following: terms
xi , yi or (xi ’ µx )(yi ’ µy ) in which one or both of (xi , yi ) are unobserved,
c c

1. If xi is unobserved but xi + xi+1 , yi , yi+1 are all observed, is observed,
replace xi by
xi + xi+1 βx|y
(yi ’ yi+1 )
b
xi = +
2 2
and
xi+1 = (xi + xi+1 ) ’ xi .
b b
7.4. MAXIMUM LIKELIHOOD ESTIMATION 365

2. If yi is unobserved but yi + yi+1 , xi , xi+1 are all observed, replace yi by

yi + yi+1 βy|x
(xi ’ xi+1 )
b
yi = +
2 2
and
yi+1 = (yi + yi+1 ) ’ yi .
b b

3. If all of xi , yi xi+1 , yi+1 are unobserved but xi + xi+1 and yi + yi+1 are
observed, replace the two terms (xi ’c )(yi ’c ) and (xi+1 ’c )(yi+1 ’c )
µx µy µx µy
in (7.36) by the single term

1
(xi + xi+1 ’ 2c )(yi + yi+1 ’ 2c ).
µx µy
2
Since there is now one fewer term in (7.36), reduce the value of n appearing
there by one.

Here the regression coe¬cients βx|y = cov(x, y)/var(y) and βy|x = cov(x, y)/var(x)
can either be estimated in a similar fashion from the data at hand, or estimated
from a larger data-set in which both xi and yi are observed provided we think
they are reasonably stable over time.
There are, of course, patterns for the missing data other than those consid-
ered in 1-3 above. They are rare in the case of two-dimensional data but much
more likely in the case of higher dimensional correlated observations. The ad-
justments to the maximum likelihood estimators required to accommodate such
patterns become increasingly complex, virtually impossible to provide a simple
formula. The imputation method discussed earlier in this chapter, in which the
missing observations are replaced not by their conditional mean but by imputed
values with the same characteristics as the original complete data is the most
successful method for providing estimates in such circumstances since it has only
two ingredients, the relatively simple formulae for the complete data maximum
likelihood estimators, and the ability to impute or generate the missing data
with the correct distribution.

7.4.4 Monte Carlo Maximum Likelihood Estimation
Consider to begin with independent observations X1 , ..., Xn all from the same
probability density function fθ (x). The maximum likelihood estimator of the
parameter is the value θ which maximizes the likelihood
n
Y
L(θ) = fθ (xi ).
i=1

The ideal Monte Carlo methodology for maximum likelihood estimation would
not require speci¬cation of the family of densities fθ at all, but only the ability
to generate variables Xi corresponding to any value of θ. However a moment™s
366 ESTIMATION AND CALIBRATION

re¬‚ection will convince us that in general this is a tall order. By repeated simu-
lation of variables Xi under the parameter value θ and averaging, Monte Carlo
methods permit us to estimate unbiasedly any expected value, such as Eθ [ψ(X)]
for arbitrary function ψ. Unfortunately the probability density function at a par-
ticular point xi cannot be written as such an expected value, except in the case
of a discrete random variable X where, of course, Pθ (X = xi ) = Eθ [I(X = xi )].
If we have a method for estimating the density (or probability function if X
is discrete) from simulations Xi (θ) taken for each parameter value θ, we might
b
use these simulations to estimate fθ (x) say with fθ (x) and then maximize the
estimated likelihood
n
Y
b
b (7.37)
L(θ) = fθ (xi ).
i=1

This estimator n
Y
b
arg max fθ (xi )
i=1

avoids any explicit form for the likelihood function at the cost of dealing only
with an approximation to the likelihood and typically will require a very large
simulation. Since the optimality properties of the maximum likelihood estimator
depend on the local ¬rst and second derivatives of fθ (xi ) in a neighbourhood
of the true value it is essential that such an approximation be very accurate
in a neighbourhood of the estimator. One of the most common methods for
approximating fθ (x) is that of kernel density estimation which relies on a kernel
function K(x) satisfying Z ∞
K(x)dx = 1,
’∞

often a probability density function centered around 0. Let Xj (θ), j = 1, ..., m
denote simulated data under the parameter value θ, if possible using common
random numbers. When the distributions have easily inverted cumulative dis-
’1
tribution functions Fθ , for example, we could de¬ne Xi (θ) = Fθ (Ui ) where
Ui are independent uniform[0,1]. We must choose a window width parameter h
controlling the degree of smoothing, and then we estimate the density using
m
1X xi ’ Xj (θ)
b
fθ (xi ) = K( )
mh j=1 h

and so (7.37) becomes
n m
YX xi ’ Xj (θ)
b ’n
{ (7.38)
L(θ) = (mh) K( )}.
h
i=1 j=1

Instead of operating completely free of the likelihood function, if we have
partial information on fθ , Monte Carlo Maximum Likelihood can be made more
e¬cient. For example it is reasonably common to know fθ only up to the
normalizing constant (which, in theory we could obtain by laborious integration)
7.4. MAXIMUM LIKELIHOOD ESTIMATION 367

and wish to estimate θ by maximum likelihood without evaluating this scaling
constant. To be more speci¬c, suppose that
hθ (x)
(7.39)
fθ (x) =
c(θ)

where the function hθ (x) is known and relatively easily evaluated but c(θ) is
unknown. There are many examples where probability density functions take
a form such as this, but perhaps the most common is in a problem involving
conditioning. For example suppose that we know the joint probability density
function of two random variables fθ (x, y) and we wish to condition on the value
of one, say Y. Then the conditional probability density function is
fθ (x, y)
fθ (x|y) = R
fθ (z, y)dz
R
which is of the form (7.39) with c(θ) replaced by fθ (z, y)dz.
There are several methods available for generating a random sample from
the density fθ without using the constant c(θ) including acceptance-rejection
and the Metropolis-Hastings algorithm. Notice that the likelihood for a sample
of size n will take the form
n
X
ln hθ (xi ) ’ nEθ [ln hθ (X)] (7.40)
ln L(θ) =
i=1
P
and so the constant is used to simply center the the estimating function n ‚θ ln hθ (xi )

i=1
at its expected value. If we replace fθ by a density with respect to some other
value of the parameter or some completely di¬erent density function g(x), then
the maximizer of (7.40) satis¬es
n
X‚
(7.41)
0= ln fθ (xi )
‚θ
i=1
n
X‚ ‚
ln hθ (xi ) ’ nEθ [ ln hθ (X)] or (7.42)
=
‚θ ‚θ
i=1
n
1 X ‚θ hθ (xi )
‚ ‚
‚θ hθ (X)
(7.43)
= ln E[ ]
n i=1 hθ (xi ) g(X)

where the expected value in the second term is under the density g(x) for X.
Geyer(1996) suggests using this as the likelihood equation but with the expected
value replaced by an average over Monte-Carlo simulations. In other words if
we use N simulations of Xj generated under the density g(x), we would solve
for θ the estimating equation
n N
1 X ‚θ hθ (xi ) 1 X hθ (Xj )

(7.44)
= ln[ ].
n i=1 hθ (xi ) N j=1 g(Xj )
368 ESTIMATION AND CALIBRATION

This gives rise to several additional questions. How is the asymptotic normal
distribution of the maximum likelihood estimator e¬ected by the introduction
of the Monte Carlo estimator of

‚θ hθ (X)
ln E[ ],
g(X)

and what sort of importance distribution g(X) leads to the most e¬cient esti-
mator of θ? For the ¬rst question, Geyer (1995) provides an estimator of the
additional variance or covariance (over and above the usual covariance of a maxi-
mum likelihood estimator) introduced by the Monte Carlo estimation, and some
discussion on the choice of importance distribution. The considerations around
the choice of g(x) are much the same here as in any application of importance
sampling; we must avoid values of g(x) which are too small and lead to large

‚θ hθ (X)
tails in the distribution of . Mixtures are suggested, for example
g(X)

k
X hθj (x)
g(x) = pj
c(θj )
j=1

for various choices of the parameter values θ1 , ...θk and accompanying weights
p1 + ... + pk = 1. Unfortunately this introduces the additional parameters
p1 , ..., pk which also need estimation, so it seems best to be parsimonious in
our choice of k.
The rather restrictive condition that
hθ (x)
fθ (x) =
c(θ)

for known function hθ (x) can be relaxed considerably above. In fact its primary
purpose is to lead to the estimating function

‚θ hθ (xi )
ψ(xi , θ) =
hθ (xi )

which was centered at its expectation using Monte Carlo and then added over all
i to provide the estimating function for θ. The only purpose of the assumption
(7.39) was to motivate a speci¬c choice of function ψ. If we know virtually
nothing about fθ (x), but have the ability to generate random variables Xi (θ)
for any given parameter value, we might well begin with a number of candidate
estimating functions like ψ1 (x, θ), ....ψk (x, θ), center them with Monte Carlo as
before, and then determine the weighted average which results in the minimum
possible asymptotic variance. When we do this, of course, we are not using
the exact likelihood function (presumably this is unavailable), but since we are
implicitly estimating the score function with a function of the form
X
βi (θ)[ψi (x, θ) ’ µi (θ)],
i
7.4. MAXIMUM LIKELIHOOD ESTIMATION 369

we are essentially approximating the likelihood with an exponential family dis-
tribution X
c(θ) exp{ Bi (θ)Ψi (x, θ)}
i

with the functions Ψi either given by ψi or by ‚θ ψi . Suppose we approximate
the expectation
µi (θ) = Eθ [ψi (X(θ), θ)]
using m simulations with
m
1X
b
µi (θ) = ψi (Xj (θ), θ).
m j=1

These same simulations allow us to estimate the covariance matrix Cij (θ) =
covθ (ψi (X, θ), ψj (X, θ)) as well as a di¬erence of the form where h is small.
m
1X
b [ψi (Xj (θ + h), θ) ’ ψi (Xj (θ ’ h), θ)]
δi (θ) =
2mh j=1

(here it is important that we use common random numbers). Then the optimal
estimating function constructed as a linear combination of the functions ψi is
X X
[ψj (xi , θ) ’ µj ]
b
βj (θ)
j i

where the vector of values of βj is estimated by

b b
β T (θ) = δ T (θ)C ’1 (θ). (7.45)

In order assess the feasibility of this method, we consider a simple example,
the symmetric stable laws whose density functions are quite di¬cult to evaluate,
but for which it is easy to generate random variables. Suppose X1 , ..., Xn are as-
sumed independent identically distributed with symmetric stable distributions,
parameters µ = θ, with parameters c = 1 and ± assumed known. Consider
M ’estimating functions of the form
X
φk (xi ’ θ), k = 1, 2, ...4
ψk (x, θ) =
i

where the candidate functions φk are

k k
φk (x) = max(min(x, ), ’ ), k = 1, ...4.
2 2
Although we know that these functions satisfy

Eθ [φk (xi ’ θ)] = 0
370 ESTIMATION AND CALIBRATION

for members of the symmetric stable family and therefore the functions ψk (x, θ)
are unbiased, we do not wish to use that knowledge so we will continue to use
b
Monte Carlo to generate the estimators µk (θ). What combination of these four
estimating functions provides the best estimator of θ? If we ¬x the value c = 1,
we have seen that we can generate X(θ) using
¸ ± ’1
1
cos(U (1 ’ ±))
(cos U )’1/± (7.46)
X(θ) = θ + sin(±U )
E

with U1 uniform[’π/2, π/2] and E standard exponential, independent of U . A
single pair (U, E) permits generating X(θ) for any θ. In this case, estimation of
X

φ0 (xi ’ θ)
δk (θ) = ψk (x, θ) = k
‚θ i

is a little easier since
X X
φ0 (xi ’ θ) = I(|xi ’ θ| < k).
k
i i


We tested this routine on data the data below, with n = 21, generated from
a symmetric stable distribution with parameters (0, 1, 1.5).

-3.1890 -2.0491 -1.8185 -1.7309 -1.5403 -1.3090 -1.2752
-0.8266 -0.7154 -0.2706 -0.2646 -0.0478 0.2827 0.9552
1.0775 1.1009 1.6693 1.6752 2.4415 2.4703 5.5746
Table 7.5: Data from the symmetric stable(0, 1, 1.5) distribution

The mean of these values is 0.105 and the median is ’0.2646. By running
1000 simulations with θ updated 100 times using a Robbins-Monro procedure
and beginning with θ estimated by its median, we obtained an estimator of the
parameter
b
θ = ’0.0954
with coe¬cients on the four estimating functions given by

β = (0.3561 0.6554 0.8705 1.0096).

Evidently the last two functions ψ3 (x, θ), ψ4 (x, θ) receive more weight than the
¬rst two. The resulting estimator is, of course, not really the maximum likeli-
hood estimator, but it approximates the same to the extent that the functions
R
φk and φk can be used to approximate the density function. Convergence of
a Robbins-Monro algorithm here, as is frequently the case, is quite slow. To
some extent this is a necessary payment made for our assumed ignorance about
the form of the distribution and the expected value of the estimating functions.
We are repeatedly approximating the latter using Monte Carlo and the noise so
introduced results in the slow convergence. Here, as elsewhere, there is a price
to pay for ignorance.
7.5. USING HISTORICAL DATA TO ESTIMATE THE PARAMETERS IN DIFFUSION MODELS.371

There are certainly more e¬cient estimators of the median of a symmetric
stable distribution, but what is notable about this method is that it provides
a “black-box” estimation of parameters requiring no knowledge of the distribu-
tion beyond the ability to simulate values with a particular parameter value and
a set of vaguely reasonable estimating functions. Arguably, a fairly broad set
of candidate estimating functions together with massive computer power may
ultimately generate Monte Carlo estimators that rival maximum-likelihood es-
timation in e¬ciency without any knowledge of the structure of the distribution
beyond the ability to generate samples from it.


7.5 Using Historical Data to estimate the para-
meters in Di¬usion Models.
7.5.1 General Ito Processes
Typically a di¬usion model for ¬nancial data includes parameters with unknown
values which require estimation. We might, for example, wish to ¬t a di¬usion
model like
dr(t) = µ(t, r(t))dt + σ(t, r(t))dWt
to historical data on spot interest rates. Here as elsewhere we assume that
the drift and di¬usion coe¬cients are such that a solution exists. There are
many choices of the drift and di¬usion coe¬cients µ(t, r(t)), σ(t, r(t)) leading
to common models such as the Vasicek (1977), the CIR (Cox, Ingersoll, Ross,
1985), the Ho-Lee (Ho and Lee, 1986) model etc. but they all have unspeci¬ed
parameters θ that must be either estimated or ¬t to price data before they can
be used. Suppose for the present our intended use is not the pricing of bonds or
interest rate derivatives, but using the model to simulate interest rate scenarios
for an insurance company. In this case it is the P -measure that is our interest
and so historical values of r(t) are directly relevant. Unknown parameters θ
may lie in either the drift term µ(t, r(t)) or in the di¬usion σ(t, r(t)) so we will
add the argument θ to either or both function as required. There is a great deal
of literature dealing with estimation of parameters in a di¬usion model. We will
largely follow McLeish and Kolkiewicz (1997).
According to the simplest discrete time approximation to the process, the
Euler Scheme, the increments in the process over small intervals of time are
approximately conditionally independent and normally distributed. Suppose
we have already simulated the spot interest rate for integer multiples of ∆t
up to time t = j∆t. Provided the time increment ∆t is small, if we denote
∆r = r(t + ∆t) ’ r(t), we can generate the interest rate at time t + ∆t using
the approximation:

∆r ’ µ(t, r(t), θ)∆t ∼ N (0, σ 2 (t, r(t), θ))∆t). (7.47)

Thus, if θ is a parameter in the drift term only, it can be estimated using
weighted least squares; i.e. by minimizing the sum of the squared standardized
372 ESTIMATION AND CALIBRATION

normal variates. X
wt (∆r ’ µ(t, r(t), θ)∆t)2
minθ
t

where the weights wt are proportional to the reciprocal of the variances w =
1/σ 2 (t, r(t)). The solution to this is obtained by setting the derivative with
respect to θ equal to zero and solving for θ :
X ‚µ(t, r(t), θ)
(∆r ’ µ(t, r(t), θ)∆t) = 0. (7.48)
wt
‚θ
t

and it is not hard to see that this is the maximum likelihood estimator of θ as-
suming the normal approximation in (7.47) is exact. If the unknown parameter
θ is in the di¬usion term only, or in both drift and di¬usion term, it is also easy
to see that the log likelihood for the normal approximation is

1 X (∆r ’ µ(t, r(t), θ)∆t)2
1
’n ln(σ(t, r(t), θ)) ’
σ 2 (t, r(t), θ)
2 2t

with n the number of time points t at which we observed the spot interest rate.
Maximizing this over the parameter θ results in the estimating function
X ‚µ(t, r(t), θ) (∆r ’ µ(t, r(t), θ)∆t) ‚ ln[σ 2 (t, r(t), θ)] (∆r ’ µ(t, r(t), θ)∆t)2
{ ’1} = 0
+
σ 2 (t, r(t), θ) 2σ 2 (t, r(t), θ)
‚θ ‚θ
t

which combines the estimating function for the drift (7.48) with that for the
di¬usion term, using weights inversely proportional to their variances.
Girsanov™s Theorem allows us to construct maximum likelihood estimators
also for the continuously observed processes analogous to some of the estimators
above. For example, suppose the parameter θ resides in the drift term only, so
the model is of the form

dXt = µ(t, Xt , θ)dt + σ(t, Xt )dWt , Xo = x0

Suppose P is the measure on C[0, T ] induced by Xt with X0 = 0 and the
measure P0 on the same space is induced by a similar equation but with zero
drift:
(7.49)
dXt = σ(t, Xt )dWt , X0 = x0 .
Then Girsanov™s Theorem (see the appendix) asserts that with
Zt Zt Z
1 t µ2 (t, Xt , θ)
µ(s, Xs , θ) µ(s, Xs , θ)
Mt = E( dXs ’
dWs ) = exp{ ds},
2 2 0 σ 2 (t, Xt )
σ(s, Xs ) 0 σ (s, Xs )
0
(7.50)
under some boundedness conditions, then the Radon Nikodym derivative of P
with respect to P0 is given by MT ,
dP
= MT .
dP0
7.5. USING HISTORICAL DATA TO ESTIMATE THE PARAMETERS IN DIFFUSION MODELS.373

The exponential martingale
Z t
µ(s, Xs , θ)
Mt = E( dWs )
σ(s, Xs )
0

describes the Radon-Nikodym derivative for the processes restricted to [0, t]
and the process Xs appearing in (7.50) is assumed generated from the relation
(7.49). Thus the maximum likelihood estimate of θ is obtained by maximizing
MT . Setting the derivative of its logarithm equal to 0 results in the likelihood
equation
Z Z
‚µ(t, Xt , θ) 1 ‚µ(t, Xt , θ) 1
dXt ’ µ(t, Xt , θ)dt = 0
σ 2 (t, Xt ) σ 2 (t, Xt )
‚θ ‚θ
Z
‚µ(t, Xt , θ) 1
(dXt ’ µ(t, Xt , θ)dt) = 0
or
σ 2 (t, Xt )
‚θ

Of course if we only had available observations taken at discrete time points
t1 < t2 < . . . rather than the continuously observed process, we would need
to replace the integral by a sum resulting in the same estimating function as
(7.48), namely
X ‚µ(ti , Xti , θ)
σ’2 (ti , Xti ) (∆Xti ’ µ(ti , Xti , θ)∆ti ) = 0. (7.51)
‚θ
ti

For a continuous time model of the form

dXt = µ(t, Xt , θ)dt + σ(t, Xt , θ)dWt

in which unknown parameters θ are in both the drift and the di¬usion part of
the stochastic di¬erential, estimation of the parameter θ is very di¬erent than in
the discrete time case. Consider, for example, solving the estimating equation
X
{(∆Xti ’ µ(ti , Xti , θ)∆ti )2 ’ σ 2 (ti , Xti , θ)∆ti } = 0 (7.52)
ψ(θ) =
i

to obtain an estimator of θ. Since for su¬ciently small time increments ∆ti the
Euler increments are approximately normal, this estimating function is asymp-
totically unbiased as max{∆ti } ’ 0. Moreover, for any normal random variable
Z with mean 0 var(Z 2 ) = 2(var(Z))2 so the variance of the estimating function
above is, in the limit,
X
σ 4 (ti , Xti , θ)(∆t)2
var(ψ(θ)) ’ 2
i
X
σ4 (ti , Xti , θ)(∆ti )
· max{∆ti }
i
i
’0
374 ESTIMATION AND CALIBRATION

P
σ4 (ti , Xti , θ)(∆ti ) converges to the integral
the convergence to zero since i
Z T
σ 4 (t, Xt , θ)dt
0

which is ¬nite, provided for example that the function σ(t, x, θ) is bounded.
In general the asymptotic variance of the estimator obtained by solving an
equation of the form (7.52) is, under fairly modest regularity assumptions on
the estimating function ψ, asymptotic to

var(ψ(θ))
(7.53)
.

E 2 [ ‚θ ψ(θ)]

It is easy to see that E[ ‚θ ψ(θ)]is asymptotic to
Z
‚σ(t, Xt , θ)
’2 σ(t, Xt , θ)dt
‚θ

which is typically non-zero if σ depends on θ. Then (7.53) approaches 0 since
the denominator is asymptotically nonzero. We conclude that the estimator
of θ determined by the estimating function (7.52) approaches perfection (i.e.
its variance approaches zero) as ∆t ’ 0 and this it true regardless of how long
or how short the time interval is over which the process has been observed.
Restated, this says that for a continuously observed process, a parameter in
the di¬usion term (including the volatility parameter) can be perfectly estimated
from an arbitrarily short period of observation. Now if you think this is too good
to be true, you are right, at least in a practical sense. In practice, volatility is
by no means known or estimated precisely, because the di¬usion model does not
¬t on the minute time scale necessary for the above argument to go through.
Far from having nearly perfect knowledge of the volatility parameters, volatility
appears to change rapidly on any time scale for which the di¬usion model is con-
sidered a good ¬t. In the next section we discuss various estimates of volatility
obtained from a discretely observed time (geometric) Brownian motion.


7.6 Estimating Volatility
In the last section, we saw that those parameters in the di¬usion coe¬cient of
a general di¬usion were all too easily estimated. In fact from an arbitrarily
short segment of the path, we can, at least in theory, obtain an estimator of
the volatility which is exact (is unbiased and has zero variance) because of the
in¬nite variation of a di¬usion process in these arbitrarily short periods of time.
Information for estimating the di¬usion coe¬cient obtains much more rapidly
than for the drift, and in this respect the continuous time processes are quite
di¬erent than their discrete analogues. Two di¬usions processes with di¬erent
di¬usion coe¬cients are mutually singular. Intuitively this means that the sam-
ple paths generated by these two measure lie in two disjoint subsets of C[0, T ]
7.6. ESTIMATING VOLATILITY 375

and so that we can theoretically determine from a single sample path the exact
di¬usion term. Of course to determine which set a path lies in, we need to
estimate or examine the quadratic variation of the process, something which
requires a perfectly graphed sample path and an in¬nitely powerful microscope.
The real world is considerably di¬erent than this idealized situation for several
reasons. First, we never observe a process in continuous time but only at dis-
crete time points that may be close together. Second, di¬usion processes ¬t
data on security prices, interest rates and exchange rates only when viewed over
a longer time intervals than a minute, hour, or day. Short-term behaviour is
very di¬erent; for example they usually evolve through a series of jumps corre-
sponding to trades of varying magnitudes and frequencies. Third, information
obtained from the derivatives market can be used to obtain implied volatili-
ties but these are not the same as theP -measure or historical volatilities in a
discrete-time model. In theory, they should agree in the continuous time Black-
Scholes model, since the risk neutral measure has the same di¬usion coe¬cient
as does the P measure, but this is of limited use since volatilities can change
rapidly.
The volatility parameter is the single most important parameter for pricing
short-term options and arguably the most important parameter for virtually all
¬nancial time series. In Figure 7.4 we plot the price of a call option on a stock
currently trading at $10. The strike price of the stock is $12 and the time to
maturity is 1/2 year. Notice that the option price ranges from close to 0 when
σ = 0.1 to around $1.40 when σ = 0.7 indicating that di¬erent estimates
of volatility will have substantial impact on the option price. In fact over a
large part of the range of this graph the relationship between option price and
volatility is nearly linear.
Complicating the estimation of σ is its time-varying behaviour, with periods
of persistent high volatility followed by periods of relatively lower volatility. Fail-
ure to accommodate the behaviour of the volatility process may result in highly
misleading models, substantial pricing errors, and the catastrophic consequences
that can accompany them. We have already distinguished between estimation
of volatility based on historical data and the calibration of the volatility para-
meter to option price data. The latter is required if we are pricing derivatives
since it chooses a value of the volatility parameter that is most consistent with
the observed prices of options and futures. The former is necessary if we are
interested in volatility for the purpose prediction, risk management, or scenario
generation. The implied volatility is obtained from the price of one or more
heavily-traded or benchmark derivatives sold on the open market having the
same price process St for the underlying asset. We determine the volatility
parameter which produces the market price of a given option. For example sup-
pose an option with strike price K, maturity T, and initial value of the stock S0
is traded on the market at a price given by V. Then, because the Black-Scholes
price is increasing in σ, we may solve the equation
(7.54)
BS(S0 , K, r, T, σ) = V
for the implied volatility parameter σ. Because BS(S0 , K, r, T, σ) is monotone
376 ESTIMATION AND CALIBRATION




Figure 7.4: Price of Call option as a function of volatility, S0 = 10, K = 12, r =
0.05, T = 0.5 years.



in σ a solution exists (as long is V is within a reasonable range) and is unique.
This estimate of volatility di¬ers from the historical volatility obtained by com-
puting the sample variance of the returns log(St+1 /St ). Since it agrees with the
market price of a speci¬c option, it re¬‚ects more closely the risk-neutral distri-
bution Q and is therefore used for pricing other derivatives. There are several
disadvantages of this method of calibrating the parameter. The calibrated value
of σ depends on the stock, the strike price of the option, as well as the current
time t, the time to maturity T ’ t and to some extent on other parameters
that are harder to measure such as the liquidity of the option and the degree to
which it is used as a benchmark. Nevertheless it is common to transform option
prices V to volatility σ using (7.54) and then rather than deal with option prices,
analyze these implied volatilities. This transformation may be justi¬ed if the re-
sulting model is simpler or smoother as a function of σ. However there are many
monotonic transformations of price that could do a better job of simplifying a
non-Gaussian model.
In this section we will concentrate on the estimation of volatility based on
historical data, as opposed to calibration. An estimator based on historical
data approximates a parameter of the real world measure P whereas the cali-
brated parameter is a property of the risk-neutral measure Q. To explore the
di¬erences between implied and historical volatility, we downloaded two years
of daily stock price data (Nov 1, 1998 to Nov 1, 2000) for Nortel (NT), listed
on the New York Stock Exchange from the web-site http://¬nance.yahoo.com
and on the basis of this, wish to estimate the volatility. We used the “adjusted
7.6. ESTIMATING VOLATILITY 377




Figure 7.5: Nortel Returns, Nov 1, 1998-Nov 1, 2000.



close” since these have been adjusted for stock splits and dividends. The daily
returns over this period are graphed in Figure 7.5 and the signi¬cant increase
in volatility is evident towards the end of the sequence. Since the logarithm
of daily stock prices is assumed to be a Brownian motion we may estimate the
daily volatility using the sample variance of the ¬rst di¬erences in these loga-
rithms. To obtain the variance over a year, multiply by the number of trading
days (around 253) in these years. Thus the annual volatility is estimated by
sqrt(253*var(di¬(log(ntprice)))) which gives a volatility of around 0.635.
How does this historical estimate of volatility compare with the volatility as
determined by option prices?
Table 7.5 is a segment of a table of option prices obtained from the Chicago
Board of Options Exchange and provides the current price (on November 2,
2000) of calls and puts on NT. There was a January $40 call selling for $8 5 8
and a $40 January put for $4 1 . The price of the stock when these options prices
8
were recorded was $44 (evidently the “good old days”!)


Calls Last Sale Net Bid Ask Volume Open Interest

01 Jan 40 (NT AH-E) 8 5/8 -7/8 8 3/8 8 7/8 79 10198
01 Jan 45 (NT-AI-E) 6 1/8 -1/4 5 7/8 6 1/4 460 4093
Puts Last Sale Net Bid Ask
01 Jan 40 (NT MH-E) 4 1/8 +1/4 3 7/8 4 1/4
01 Jan 45 (NT-MI-E) 6 3/8 +1/4 6 1/8 6 5/8
Table 7.5: Option Prices for Nortel, Nov 2, 2000.

Suppose the current interest rate (in US dollars since these are US dollar
prices) is 5.8%. This is roughly the interest rate on a short term risk free deposit
like a treasury bill. Then the implied volatility is determined by ¬nding the value
378 ESTIMATION AND CALIBRATION

of the parameter σ (it turns out to be around 0.79) so that the Black-Scholes
formula gives exactly this value for an option price, i.e. ¬nding a value of σ so
that PUT=4.125 and CALL=8.625, for example:
[call,put]=blsprice(44,40,.058,55/253,.79,0) ( returns call =8.6077, put =4.1066)
The implied volatility of a roughly at the money option is about σ = 0.79
and this roughly agrees with the price of the January 45 options
[call,put]=blsprice(44,45,.058,55/253,.79,0) (returns call =6.2453, put =6.6815)
Di¬erences between the observed and theoretical option prices of the mag-
nitude observed here are inevitable. Transaction cost and the fact that some
options are more liquid than others can a¬ect option prices. If we ignore this
problem, we can ¬nd a pair of values (r, σ) by minimizing the sum of squared
pricing error between the observed and Black-Scholes price of the put and call
options, but because short-term option prices are quite insensitive to r, the
estimates of r determined in this way can be unreliable.
We now have two estimates of volatility, historical volatility (0.635) and
implied volatility (0.79). Which volatility is “correct”? The answer is both.
The market conditions for this company changed enough over this period and
subsequently to e¬ect enormous changes in volatility and if we used a period of
less than 2 years, our estimate of historical volatility would be larger (the one-
year estimated volatility is 0.78, very close to the implied volatility). Moreover
the implied volatility is a property of the Q measure, and this is determined not
only by the recent history of the stock, but also investor risk preferences and
fears. The Q measure is a distribution for stock prices assigned by the market
for options. The P -measure was assigned by investor™s past tastes for the stock,
sometimes in a substantially di¬erent economic climate. It is often the case that
“implied volatilities” are greater than the historical ones, although in theory for
a Geometric Brownian motion process which allows frictionless continuous-time
hedging, the risk-neutral Q and the actual distribution P should have the same
volatilities.

7.6.1 Using Highs and Lows to estimate Volatility
There are many competing estimators of the volatility parameter of a geometric
Brownian motion and to date we have used only one, the sample variance of
stock returns. To set the stage for these estimators, suppose S(t) is a geometric
Brownian motion
d ln(S(t)) = µdt + σdWt , S(0) = S0
and we observe certain properties of non-overlapping segments of this process,
for example days or weeks. In particular suppose t1 < t2 < tn .. are the begin-
nings of certain observation periods of length ∆i where ti +∆i · ti+1 and so we
can de¬ne, for this period, the open, the close, the high and the low respectively

Oi = S(ti ), Ci = S(ti + ∆i )
Hi = max{S(t); ti · t · ti + ∆i },
Li = min{S(t); ti · t · ti + ∆i }.
7.6. ESTIMATING VOLATILITY 379

The observation periods may or may not be assumed equal in length (for example
1
daily in which case ∆i is around 252 for all i). We will sometimes assume that
the market does not move outside of these observation periods, in which case
Oi = Ci’1 since in practice the open of many stocks at the beginning of the day
is nearly identical to the close at the end of the previous day.
The simplest estimator of σ is that based on the variance of returns

var(Ri ), where Ri = ln(Ci /Oi ).

Since Ri is Normal(µ∆i , σ 2 ∆i ), we can construct a regression problem towards
estimating the parameters µ, σ
p
R
√ i = µ ∆i + µi , where µi are i.i.d. N (0, σ2 ).
∆i
By regarding the left side as the response in an ordinary least squares re-
gression, we obtain the standard estimators
Pn
i=1 Ri
µ = Pn
b (7.55)
∆i
i=1
n
1 X1
2
(Ri ’ µ∆i )2 ,
b b (7.56)
σOC =
n ’ 1 i=1 ∆i

both unbiased estimators of their respective parameters. Regression theory also
provides the variance of these estimators:

σ2
var(b) = Pn
µ ,
i=1 ∆i
2σ 4
σ2
var(bOC ) = .
n’1
b2
There are more e¬cient estimators than σOC if we include not only an open
and a close but also the high and the low for those periods. In Chapter 5,
we discovered that for a Geometric Brownian motion, the random variables
ZHi = log(Hi /Oi ) log(Hi /Ci ) and ZLi = log(Li /Oi ) log(Li /Ci ) are both expo-
nentially distributed with expected value σ 2 ∆i /2 and each of ZLi and ZHi are
independent of the open Oi and the close Ci . This auxiliary information can
be used in addition to obtain an unbiased estimator of the parameter σ2 . For
example consider the estimator
n
2 X ZHi
b2
σHH =
n i=1 ∆i

which, in view of the exponential distribution of ZH is unbiased and has variance

σ4
σ2
var(bHH ) = ,
n
380 ESTIMATION AND CALIBRATION

about half of the variance based on returns. Of course we can build a similar
estimator using the lows,
n
2 X ZLi
2
b
σLL = .
n i=1 ∆i

b2 b2
One might guess that an estimator which simply averages σHH and σLL would
be four times as good as (7.56) but in fact because of the negative correlation
between the two estimators being averaged it is better still! The average is the
Rogers and Satchell (1991) estimator
n
1 X ZHi + ZLi
b2 (7.57)
σRS =
n i=1 ∆i

which has variance depending on the correlation coe¬cient ‚ ' ’0.338 between
ZHi and ZLi
2 ZHi
σ2
var(bRS ) = var( )(1 + ‚)
n ∆i
σ4
' (1 ’ 0.338)
2n
σ4
' 0.331 .
n
Evidently the Rogers and Satchell estimator is around 6 times as e¬cient (one
sixth the variance) as the usual estimator or volatility (7.56). In fact it is
independent of (7.56) as well and so we can combine the two with weights
inversely proportional to the estimator variances to get a best linear unbiased
estimator with weights rounded,
12 62
b2
σBLU ' b b
σOC + σRS
7 7
and this estimator has variance

σ4
σ2 ≈ 0.284
var(ˆBLU )
n
around one seventh of (7.56). Using a single high, low, open, close for a seven day
period to estimate volatility is roughly equivalent to using daily returns and the
estimator (7.56). Related estimators have been suggested in the literature. See
for example Parkinson(1980). Garman and Klass (1980) suggest the estimator


1
ˆ2 (log(Hi /Li ))2 ’ (2 ln(2) ’ 1)(log(Ci /Oi ))2 (7.58)
σGK =
2
b2
which is similar to σBLU .
We could quibble over which of the above estimators to use, based on extra-
ordinarily precise e¬ciency calculations for a geometric Brownian motion but
7.6. ESTIMATING VOLATILITY 381

much of this exercise would be wasted. Most indeed improve e¬ciency over
(7.56) but not quite to the extent predicted by theory and provided that we use
the extremes, the di¬erences between those estimators which do use them are
relatively small. To demonstrate this we compared them on the Dow Jones In-
dustrial average for the period January 1999 to May 20, 2004. We computed all
of the estimators for daily data on the open, close, high, and low and then plot-
ted a 21-day moving average of these estimators against time. For example two
b b
of the estimators, σOC and σHH are given in Figure 7.8. The other estimators
are not plotted since they closely resemble one of these two. The curve labelled
b
“intraday volatility” measures the annualized volatility as determined by σHH
b
and that labelled “interday volatility”, σOC . The estimators when evaluated on
the whole data set provide values as follows:

b b b b b b
σ OC = 0.18 σHH = 0.29 σLL = 0.31 σRS = 0.30 σBLU = 0.29 σGK = 0.31

b
indicating only minor di¬erences between all of the estimators except σ OC .
Over this period January 1999 to May 20, 2004, the intraday volatility was
consistently greater than the interday volatility, often by 50% or more, and there
is a high correlation among the estimators that use the highs and lows as well as
the open and close. This indicates a profound failure in the Geometric Brownian
motion assumption for the Dow Jones Industrial index, and this discrepancy
between two estimators of volatility is much greater than for any other index
I have investigated. Since we have seen substantial di¬erences in the variances
of the estimators based on the geometric Brownian motion model, it is worth
comparing the estimators for their empirical variance. However since the ¬rst,
b
σ OC appears to have a di¬erent mean, we will compare them after normalizing
by their mean: i.e. we compare values of

sqrt(var(b2 ))
σ
CV =
E(b2 ))
σ

as estimated from the 21-day moving averages. These values are given below


CVOC = 0.83 CVHH = 0.53 CVLL = 0.59 CVRS = 0.54 CVBLU = 0.55 CVGK = 0.55

and again, except for the ¬rst, they are virtually indistinguishable. What hap-
b b
pened to the gains in e¬ciency we expected when we went from σHH to σBLU ?
These gains are, in part at least, due to the negative correlation ’0.338 between
ZHi and ZLi but for this data, they are positively correlated and in fact the
21-day moving averages of ZHi and ZLi have correlation around 0.90. The
only explanation that seems reasonable is that the DJA is quite far from a Geo-
metric Brownian motion. If the volatility were stochastic or time-dependent,
as most believe it is, then this can account for the apparently high correlation
between ZHi and ZLi . For example suppose that the volatility parameter σi
was stochastic (but constant throughout the day) and possibly dependent on i.
382 ESTIMATION AND CALIBRATION




Figure 7.6: The construction of an Eggtimer plot



Then since

cov(ZHi , ZLi ) = E[cov(ZHi , ZLi |σi )] + cov(E[ZLi |σi ], E[ZLi |σi ])
2
σi ∆i
= E[cov(ZHi , ZLi |σi )] + var( )
2
it is possible for the ¬rst term to be negative (as predicted by a geometric
Brownian motion model) and yet the second term is su¬ciently large that the
covariance itself is large and positive.
Thus we have three estimators of the volatility parameter,

ln(C/O)}2 2 ln(L/O) ln(L/C) 2 ln(L/O) ln(L/C)
{ }.
, ,
∆t ∆t ∆t
While the ¬rst is independent of the other two given O, unfortunately the second
and third are themselves not uncorrelated. In order to weight them optimally
we need some information about their joint distribution. It follows that both
{ln(C/O)}2 /∆t and (ZH + ZL )/∆t provide unbiased estimators of the volatility
parameter σ 2 and indeed the latter is independent of the former.
These estimators are areas illustrated in Figure 7.6. Consider the plot cor-
responding to time t. The vertical scale is logarithmic so that logs are plotted.
This plot is constructed using an arbitrarily chosen angle θ from the four values
(O, C, H, L) using two lines `1 , `2 through the point (t, 1 ( ln(O)+ ln(C))) with
2
7.6. ESTIMATING VOLATILITY 383




Figure 7.7: Eggtimer Plot for the Dow Jones Average, Feb 1-March 27, 2000.



slopes ±tan(θ). Horizontal lines are drawn at the ordinate values ln(H), ln(L),
ln(O), ln(C) and using the points where ln(O)and log(C) strike the two lines as
corners, a rectangle is constructed. The area of this rectangle tan(θ)( ln(C/O))2
is an unbiased estimator of tan(θ)σ 2 ∆t provided the Brownian motion has no
drift. The second region consists of “wings” generated by the four points at
which the horizontal line at ln(H), ln(L) strike the lines `1 , `2 . The total area of
this region (both wings) is tan(θ)(ZL +ZH ) which is another unbiased estimator
of tan(θ)σ 2 T independent of the ¬rst, and also independent of whether or not
the underlying Brownian motion has drift. By comparing these areas, we can
detect abnormal changes in the volatility, or changes in the drift of the process
that will increase the observed value of ( ln(C/O))2 while leaving the second
estimator unchanged. Because each estimator is based only on a single period,
it is useful to provide as well a plot indicating whether there is a persistent
change in either or both of the two estimators of volatility. Related estimators
have been suggested in the literature.
We also show empirically the e¬ectiveness of incorporating the high low close
information in a measure of volatility. For example, the plot below gives the
eggtimer plot for the Dow Jones Industrial Index for the months of February
and March 2000. The vertical scale is logarithmic since the Black Scholes model
is such that the logarithm of the index is Brownian motion. A preponderance
of black rectangles shows periods when the drift dominates, whereas where the
grey tails are larger, the volatility is evidenced more by large values of the high
or small values of the low, compared to the daily change.
The 21-day rolling sum of the areas of the regions, either grey or black,
is graphed in Figure 7.8 and provides two measures of volatility. The rolling
sum of the grey areas is called the “intraday volatility” and that of the black
384 ESTIMATION AND CALIBRATION




Figure 7.8: 21-day average volatility estimates for the Dow Jones Average, 1999-
May 20, 2004



rectangles, the interday volatility. In the absence of substantial drift, both
measure the same theoretical quantity but they di¬er in this case.
This di¬erence is equally evident from Figure 7.9, the plot of the cumulative
variance for the same period of time. Of course in this plot, it is the slope not
the level which indicates the variance.
Consistent di¬erences between the intra-day and the inter-day volatility or
variances would be easy to explain if the situation were reversed and the inter-
day volatility were greater, because one could argue that the inter-day measure
contains a component due to the drift of the process and over this period there
was a signi¬cant drift. A larger intraday volatility is more di¬cult to explain
unless it is a failure of the Black-Scholes model. In this case, one might expect
a similar behaviour in another market. If we generate a similar plot over the
identical period of time for the NASDAQ index (Figure 7.10) we ¬nd that the
comparison is reversed. This, of course, could be explained by the greater drift of
the technology dependent NASDAQ (relative to its volatility) compared to the
relatively traditional market of the Dow Jones but other indices we compared
were more like the NASDAQ here than like the DJA.
There is no doubt that this di¬erence re¬‚ects a greater intraday range for the
DJA than other indices. In fact if we plot the cumulative value of the range of
the index divided by the close (H ’ L)/C as in Figure 7.11, it con¬rms that the
daily range as measured by this ratio is consistently smaller for the NASDAQ
7.6. ESTIMATING VOLATILITY 385




Figure 7.9: Cumulative Variances of the Dow Jones Average




Figure 7.10: Cumulative Variances for the NASDAQ index
386 ESTIMATION AND CALIBRATION




Figure 7.11: Comparison of the cumulative variance of the Dow Jones and the
NASDAQ, Jan 1999-July 2000



than for the Dow Jones for this period.
Although high, low, open, close data is commonly available for many ¬nan-
cial time series, the quality of the recording is often doubtful. When we used
older data from the Toronto Stock Exchange, there were a number of days in
which the high or low were so far from open and close to be explicable only as
a recording error (often the di¬erence was almost exactly $10). When the data
on highs and lows is accurate, there is substantial improvement in e¬ciency
and additional information available by using it. But there is no guarantee
that published data is correct, particularly old data. A similar observation on
NYSE data is made by Wiggins (1991); “In terms of the CUPV data base itself,
there appear to be a number of cases where the recorded high or low prices are
signi¬cantly out of line relative to adjacent closing prices”.


7.7 Estimating Hedge ratios and Correlation Co-
e¬cients
The correlation coe¬cient between two asset prices is important not only be-
cause it indicates a degree of dependence between the two assets but because it
is a required component for many practical investment decisions such as optimal
portfolio selection, risk management, and hedging. There is no perfect hedge
(except apparently in a Japanese garden) and so in practice we are required to
use one asset to hedge another (hopefully highly correlated) asset. These may
include derivatives or similar investments (for example bonds on the same or
related underlying) which react similarly to market conditions.
7.7. ESTIMATING HEDGE RATIOS AND CORRELATION COEFFICIENTS387

Suppose we wish to hedge one investment, say in stock 2 using another, stock
1. As before, assume that we have data on the high, low, open and close of both
stocks over non-overlapping time intervals (ti , ti+∆i ), i = 1, 2, ..., n. Denote the
high, low, open and close for stock j on time interval i by (Hij , Lij , Oij , Cij ), i =
1, .., n , j = 1, 2. Again we will denote the returns by
Rij = ln(Cij /Oij ).
Then under the Geometric Brownian motion assumption, the return vector
2 2
(Ri1 , Ri2 ) has a bivariate normal distribution with variances σ1 ∆i , σ2 ∆i and
correlation coe¬cient ρ.
If we assume that we are able to rehedge our portfolio at the beginning of
each time interval, and at the beginning of interval i we are long 1 unit worth
of stock 2 and short hi units worth of stock 1, then our total return at the end
of the i™th period is Ri2 ’ hi Ri1 . The optimal hedge ratio is the value of hi
which minimizes the variance of Ri2 ’ hi Ri1 and this is given by

cov(Ri2 , Ri1 ) σ2
=ρ .
hi =
var(Ri1 ) σ1
Note that hi is a constant independent of time, at least over periods when the
stock volatilities and correlation coe¬cients remain unchanged. While implied
volatilities σ1 ,σ2 may be obtained from derivative prices for each of these assets,
the correlation parameter ρ is unknown and, unless there is a traded option such
as a spread option whose value depends speci¬cally on this correlation, ρ needs
to be estimated from historical data. The simplest estimator of the correlation
coe¬cient is the sample covariance of the returns,
P
(Ri2 ’ R2 )(Ri1 ’ R1 )
ρC = cor(Ri2 , Ri1 ) = qP i
c (7.59)
ˆ P
i (Ri2 ’ R2 ) i (Ri1 ’ R1 )

c
where cor denotes the sample correlation coe¬cient. By a common argument
( see for example Anderson (1958, Theorem 4.2.6) this has asymptotic variance
1
(1 ’ ρ2 )2 .
n
Here, as in McLeish(2004), we will consider using historical data for (Hij , Lij , Oij , Cij ), i =
1, .., n , j = 1, 2 to estimate ρ. In the case of two or more correlated geometric
Brownian motion processes, the joint distributions of highs, lows and closing
values is unknown, and so we will need to revert to a simpler alternative than
maximum likelihood estimation. We have seen that in the Black-Scholes model,
the statistics

ZHi1 = ln(Hi1 /Oi1 ) ln(Hi1 /Ci1 )
ZHi2 = ln(Hi2 /Oi2 ) ln(Hi2 /Ci2 )
ZLi1 = ln(Li1 /Oi1 ) ln(Li1 /Ci1 )
ZLi2 = ln(Li2 /Oi2 ) ln(Li2 /Ci2 )
388 ESTIMATION AND CALIBRATION

1.4
1/3
Distribution of Z
Normal approximation

1.2




1




0.8




0.6




0.4




0.2




0
0 0.5 1 1.5 2 2.5 3



Figure 7.12: The Normal approximation (dashed line) to the distribution of
Z 1/3 where Z is exponential.



all have marginal exponential distributions and each is independent of the open
and close for the i™th interval.
Suppose we transform each of the above exponential random variables with
some function g and assume that we can determine the correlation coe¬cients
cor(g(ZH1 ), g(ZH2 )) = b(ρ) as a function of ρ. For simplicity assume that we
have subtracted the mean and divided by the standard deviation to provide
a function g such that E{g(ZH1 )} = 0, var{g(ZH1 )} = 1. There are various
possibilities for the transformation g, the simplest being a standardized power
p
p p p p
g(ZH1 ) = (ZH1 ’ E(ZH1 ))/ var(ZH1 ) for some suitable value of p > 0. A
transformation of the gamma distributions in general and the exponential dis-
tribution in particular that make them very nearly normal is the cube root
transformation (p = 1/3) (see Sprott, 2000, Chapter 9, Appendix).
For an exponential random variable ZH there is only a slight di¬erence
between the graph of the probability density function of

1/3 1/3
ZH ’ E(ZH )
q (7.60)
1/3
var(ZH )

and the graph of the standard normal probability density function, this di¬er-
ence lying in the left tail (see Figure 7.12), the region in which we are least
interested in the maximum since it is very close to the open or close. We re-
placed the theoretical mean and variance in (7.60) by the sample mean and
7.7. ESTIMATING HEDGE RATIOS AND CORRELATION COEFFICIENTS389

variance and so the function used ultimately was

1/3 1/3
ZHi ’ ZH
g(ZHi ) = q .
1/3
d
v ar(ZH )

Although it is theoretically possible to obtain the function b(ρ), the exact
expression requires the evaluation of Bessel functions. For simplicity we ¬t
polynomials of varying degrees. For example the polynomial of degree 9

b(ρ) ≈ .4822ρ9 +.2102ρ8 ’.9629ρ7 ’.3104ρ6 +.7006ρ5 +.1887ρ4 ’.1288ρ3 +.1822ρ2 +.6385ρ+.0008

is accurate to within 1%.
Inverting this approximation to b(ρ) provides a tractable estimator of the
correlation between stocks based only on the correlation between the marginally
exponential statistics ZHi , ZLi . This estimator is
1 1/3 1/3 1/3 1/3
ρHL = b’1 ( (cor(ZH1 , ZH2 ) + cor(ZL1 , ZL2 )))
c c (7.14)
ˆ
2
1/3 1/3
A similar estimator obtains as well from the cross terms since corρ (ZH1 , ZH2 ) =
b(’ρ).
1 1/3 1/3 1/3 1/3
ρ2 = ’b’1 ( (cor(ZH1 , ZL2 ) + cor(ZL1 , ZH2 )))
c c (7.15)
ˆ
2
c
where again cor denotes the sample correlation, but this estimator is very inef-
¬cient and adds little to our knowledge of the correlation. We will not consider
it further. Since all that is required is the inverse of the function b, we can
use a simple polynomial approximation to the inverse without sacri¬cing much
precision, for example

b’1 (c) ≈ 0.1567c5 ’ 0.5202c4 + 0.6393c3 ’ 0.8695c2 + 1.5941c.

The contenders for reasonable estimators are ρHL , ρC , or some combination such
ˆ ˆ
as an average, the simplest being a straight average of these two
1 1
ρAV =
ˆ ρC + ρHL .
ˆ ˆ
2 2
It remains to see how these estimators perform in practice both on real and
simulated data. First we simulated one year™s worth of daily observations on
two stocks, simulating the High, Low, Open, and Close for each day. There is
considerable bene¬t to using both of the estimators ρC and ρHL at least for
ˆ ˆ
simulated data. For example in Figure 7.13 we graph the two variances var(ˆC )
ρ
and var(ˆAV ) for various (positive) values of ρ. Notice that the estimator ρAV
ρ ˆ
which combines information from all of the high, low, open, close has variance
roughly one half of that of ρC which uses only the values of the open and the
ˆ
close.
390 ESTIMATION AND CALIBRATION




Figure 7.13: Variance of Estimators ρC and ρAV as a function of ρ.
ˆ ˆ


We also use these methods to investigate the correlation between two processes.
To this end we downloaded data for both the S&P500 index and Nortel (NT)
over the period June 17, 1998-June 16, 2003. This was a period when Nortel
rose and fell sharply showing increasing volatility. In Figure 7.14, we plot the
two measures of the correlation ρAV and ρC between the returns for these two
ˆ ˆ
series in moving windows of 63 days (approximately 3 months) over this period.
Note that the estimator ρAV , indicated by a dotted line, tends to give lower
ˆ
correlation than ρC since it includes a component due to the movements within
ˆ
days and these have lower correlation than the the movements between days.


7.8 Problems
1. Suppose Y1 and Y2 are independent normal random variables with pos-
2 2
sibly di¬erent variances σ1 , σ2 and expected values E(Yi ) = µi , i = 1, 2.
Show that the conditional distribution of Y1 given Y1 + Y2 = S is Normal
with mean
µ1 + w(S ’ µ1 ’ µ2 )
and variance
2
σ1 (1 ’ w)
where
2
σ1
w= 2 2.
σ1 + σ2
2. Consider data generated from the mixture density
2 2
pN (θ1 , σ1 ) + (1 ’ p)N (θ2 , σ2 )
7.8. PROBLEMS 391

0.8
ρ
C
ρ
AV
0.7



0.6



0.5



0.4
correlation




0.3



0.2



0.1



0



-0.1
1999 2000 2001 2002 2003



Figure 7.14: Two Measures of Correlation between Nortel (NT) and the S&P500
Index



where θ1 < θ2 (so that the parameters are identi¬able). Write a pro-

<<

. 2
( 3)



>>