<< стр. 2(всего 3)СОДЕРЖАНИЕ >>
n i=1
n
1X
c (7.35)
Вµy = xi
n i=1
n
1X
\ (xi в€’ Вµx )(yi в€’ Вµy ).
c c (7.36)
cov(X, Y ) =
n i=1

For small sample size the estimator (7.36) can be adjusted for bias by replacing
1 1
n by nв€’1 . The projection argument underlying the EM algorithm indicates
that the maximum likelihood estimating function for the incomplete data is
obtained by conditioning on the observed information. Projecting the three
equations above on the observed data corresponds to doing the following: terms
xi , yi or (xi в€’ Вµx )(yi в€’ Вµy ) in which one or both of (xi , yi ) are unobserved,
c c

1. If xi is unobserved but xi + xi+1 , yi , yi+1 are all observed, is observed,
replace xi by
xi + xi+1 ОІx|y
(yi в€’ yi+1 )
b
xi = +
2 2
and
xi+1 = (xi + xi+1 ) в€’ xi .
b b
7.4. MAXIMUM LIKELIHOOD ESTIMATION 365

2. If yi is unobserved but yi + yi+1 , xi , xi+1 are all observed, replace yi by

yi + yi+1 ОІy|x
(xi в€’ xi+1 )
b
yi = +
2 2
and
yi+1 = (yi + yi+1 ) в€’ yi .
b b

3. If all of xi , yi xi+1 , yi+1 are unobserved but xi + xi+1 and yi + yi+1 are
observed, replace the two terms (xi в€’c )(yi в€’c ) and (xi+1 в€’c )(yi+1 в€’c )
Вµx Вµy Вµx Вµy
in (7.36) by the single term

1
(xi + xi+1 в€’ 2c )(yi + yi+1 в€’ 2c ).
Вµx Вµy
2
Since there is now one fewer term in (7.36), reduce the value of n appearing
there by one.

Here the regression coeп¬ѓcients ОІx|y = cov(x, y)/var(y) and ОІy|x = cov(x, y)/var(x)
can either be estimated in a similar fashion from the data at hand, or estimated
from a larger data-set in which both xi and yi are observed provided we think
they are reasonably stable over time.
There are, of course, patterns for the missing data other than those consid-
ered in 1-3 above. They are rare in the case of two-dimensional data but much
more likely in the case of higher dimensional correlated observations. The ad-
justments to the maximum likelihood estimators required to accommodate such
patterns become increasingly complex, virtually impossible to provide a simple
formula. The imputation method discussed earlier in this chapter, in which the
missing observations are replaced not by their conditional mean but by imputed
values with the same characteristics as the original complete data is the most
successful method for providing estimates in such circumstances since it has only
two ingredients, the relatively simple formulae for the complete data maximum
likelihood estimators, and the ability to impute or generate the missing data
with the correct distribution.

7.4.4 Monte Carlo Maximum Likelihood Estimation
Consider to begin with independent observations X1 , ..., Xn all from the same
probability density function fОё (x). The maximum likelihood estimator of the
parameter is the value Оё which maximizes the likelihood
n
Y
L(Оё) = fОё (xi ).
i=1

The ideal Monte Carlo methodology for maximum likelihood estimation would
not require speciп¬Ѓcation of the family of densities fОё at all, but only the ability
to generate variables Xi corresponding to any value of Оё. However a momentвЂ™s
366 ESTIMATION AND CALIBRATION

reп¬‚ection will convince us that in general this is a tall order. By repeated simu-
lation of variables Xi under the parameter value Оё and averaging, Monte Carlo
methods permit us to estimate unbiasedly any expected value, such as EОё [П€(X)]
for arbitrary function П€. Unfortunately the probability density function at a par-
ticular point xi cannot be written as such an expected value, except in the case
of a discrete random variable X where, of course, PОё (X = xi ) = EОё [I(X = xi )].
If we have a method for estimating the density (or probability function if X
is discrete) from simulations Xi (Оё) taken for each parameter value Оё, we might
b
use these simulations to estimate fОё (x) say with fОё (x) and then maximize the
estimated likelihood
n
Y
b
b (7.37)
L(Оё) = fОё (xi ).
i=1

This estimator n
Y
b
arg max fОё (xi )
i=1

avoids any explicit form for the likelihood function at the cost of dealing only
with an approximation to the likelihood and typically will require a very large
simulation. Since the optimality properties of the maximum likelihood estimator
depend on the local п¬Ѓrst and second derivatives of fОё (xi ) in a neighbourhood
of the true value it is essential that such an approximation be very accurate
in a neighbourhood of the estimator. One of the most common methods for
approximating fОё (x) is that of kernel density estimation which relies on a kernel
function K(x) satisfying Z в€ћ
K(x)dx = 1,
в€’в€ћ

often a probability density function centered around 0. Let Xj (Оё), j = 1, ..., m
denote simulated data under the parameter value Оё, if possible using common
random numbers. When the distributions have easily inverted cumulative dis-
в€’1
tribution functions FОё , for example, we could deп¬Ѓne Xi (Оё) = FОё (Ui ) where
Ui are independent uniform[0,1]. We must choose a window width parameter h
controlling the degree of smoothing, and then we estimate the density using
m
1X xi в€’ Xj (Оё)
b
fОё (xi ) = K( )
mh j=1 h

and so (7.37) becomes
n m
YX xi в€’ Xj (Оё)
b в€’n
{ (7.38)
L(Оё) = (mh) K( )}.
h
i=1 j=1

Instead of operating completely free of the likelihood function, if we have
partial information on fОё , Monte Carlo Maximum Likelihood can be made more
eп¬ѓcient. For example it is reasonably common to know fОё only up to the
normalizing constant (which, in theory we could obtain by laborious integration)
7.4. MAXIMUM LIKELIHOOD ESTIMATION 367

and wish to estimate Оё by maximum likelihood without evaluating this scaling
constant. To be more speciп¬Ѓc, suppose that
hОё (x)
(7.39)
fОё (x) =
c(Оё)

where the function hОё (x) is known and relatively easily evaluated but c(Оё) is
unknown. There are many examples where probability density functions take
a form such as this, but perhaps the most common is in a problem involving
conditioning. For example suppose that we know the joint probability density
function of two random variables fОё (x, y) and we wish to condition on the value
of one, say Y. Then the conditional probability density function is
fОё (x, y)
fОё (x|y) = R
fОё (z, y)dz
R
which is of the form (7.39) with c(Оё) replaced by fОё (z, y)dz.
There are several methods available for generating a random sample from
the density fОё without using the constant c(Оё) including acceptance-rejection
and the Metropolis-Hastings algorithm. Notice that the likelihood for a sample
of size n will take the form
n
X
ln hОё (xi ) в€’ nEОё [ln hОё (X)] (7.40)
ln L(Оё) =
i=1
P
and so the constant is used to simply center the the estimating function n в€‚Оё ln hОё (xi )
в€‚
i=1
at its expected value. If we replace fОё by a density with respect to some other
value of the parameter or some completely diп¬Ђerent density function g(x), then
the maximizer of (7.40) satisп¬Ѓes
n
Xв€‚
(7.41)
0= ln fОё (xi )
в€‚Оё
i=1
n
Xв€‚ в€‚
ln hОё (xi ) в€’ nEОё [ ln hОё (X)] or (7.42)
=
в€‚Оё в€‚Оё
i=1
n
1 X в€‚Оё hОё (xi )
в€‚ в€‚
в€‚Оё hОё (X)
(7.43)
= ln E[ ]
n i=1 hОё (xi ) g(X)

where the expected value in the second term is under the density g(x) for X.
Geyer(1996) suggests using this as the likelihood equation but with the expected
value replaced by an average over Monte-Carlo simulations. In other words if
we use N simulations of Xj generated under the density g(x), we would solve
for Оё the estimating equation
n N
1 X в€‚Оё hОё (xi ) 1 X hОё (Xj )
в€‚
(7.44)
= ln[ ].
n i=1 hОё (xi ) N j=1 g(Xj )
368 ESTIMATION AND CALIBRATION

This gives rise to several additional questions. How is the asymptotic normal
distribution of the maximum likelihood estimator eп¬Ђected by the introduction
of the Monte Carlo estimator of
в€‚
в€‚Оё hОё (X)
ln E[ ],
g(X)

and what sort of importance distribution g(X) leads to the most eп¬ѓcient esti-
mator of Оё? For the п¬Ѓrst question, Geyer (1995) provides an estimator of the
additional variance or covariance (over and above the usual covariance of a maxi-
mum likelihood estimator) introduced by the Monte Carlo estimation, and some
discussion on the choice of importance distribution. The considerations around
the choice of g(x) are much the same here as in any application of importance
sampling; we must avoid values of g(x) which are too small and lead to large
в€‚
в€‚Оё hОё (X)
tails in the distribution of . Mixtures are suggested, for example
g(X)

k
X hОёj (x)
g(x) = pj
c(Оёj )
j=1

for various choices of the parameter values Оё1 , ...Оёk and accompanying weights
p1 + ... + pk = 1. Unfortunately this introduces the additional parameters
p1 , ..., pk which also need estimation, so it seems best to be parsimonious in
our choice of k.
The rather restrictive condition that
hОё (x)
fОё (x) =
c(Оё)

for known function hОё (x) can be relaxed considerably above. In fact its primary
purpose is to lead to the estimating function
в€‚
в€‚Оё hОё (xi )
П€(xi , Оё) =
hОё (xi )

which was centered at its expectation using Monte Carlo and then added over all
i to provide the estimating function for Оё. The only purpose of the assumption
(7.39) was to motivate a speciп¬Ѓc choice of function П€. If we know virtually
nothing about fОё (x), but have the ability to generate random variables Xi (Оё)
for any given parameter value, we might well begin with a number of candidate
estimating functions like П€1 (x, Оё), ....П€k (x, Оё), center them with Monte Carlo as
before, and then determine the weighted average which results in the minimum
possible asymptotic variance. When we do this, of course, we are not using
the exact likelihood function (presumably this is unavailable), but since we are
implicitly estimating the score function with a function of the form
X
ОІi (Оё)[П€i (x, Оё) в€’ Вµi (Оё)],
i
7.4. MAXIMUM LIKELIHOOD ESTIMATION 369

we are essentially approximating the likelihood with an exponential family dis-
tribution X
c(Оё) exp{ Bi (Оё)ОЁi (x, Оё)}
i
в€‚
with the functions ОЁi either given by П€i or by в€‚Оё П€i . Suppose we approximate
the expectation
Вµi (Оё) = EОё [П€i (X(Оё), Оё)]
using m simulations with
m
1X
b
Вµi (Оё) = П€i (Xj (Оё), Оё).
m j=1

These same simulations allow us to estimate the covariance matrix Cij (Оё) =
covОё (П€i (X, Оё), П€j (X, Оё)) as well as a diп¬Ђerence of the form where h is small.
m
1X
b [П€i (Xj (Оё + h), Оё) в€’ П€i (Xj (Оё в€’ h), Оё)]
Оґi (Оё) =
2mh j=1

(here it is important that we use common random numbers). Then the optimal
estimating function constructed as a linear combination of the functions П€i is
X X
[П€j (xi , Оё) в€’ Вµj ]
b
ОІj (Оё)
j i

where the vector of values of ОІj is estimated by

b b
ОІ T (Оё) = Оґ T (Оё)C в€’1 (Оё). (7.45)

In order assess the feasibility of this method, we consider a simple example,
the symmetric stable laws whose density functions are quite diп¬ѓcult to evaluate,
but for which it is easy to generate random variables. Suppose X1 , ..., Xn are as-
sumed independent identically distributed with symmetric stable distributions,
parameters Вµ = Оё, with parameters c = 1 and О± assumed known. Consider
M в€’estimating functions of the form
X
П†k (xi в€’ Оё), k = 1, 2, ...4
П€k (x, Оё) =
i

where the candidate functions П†k are

k k
П†k (x) = max(min(x, ), в€’ ), k = 1, ...4.
2 2
Although we know that these functions satisfy

EОё [П†k (xi в€’ Оё)] = 0
370 ESTIMATION AND CALIBRATION

for members of the symmetric stable family and therefore the functions П€k (x, Оё)
are unbiased, we do not wish to use that knowledge so we will continue to use
b
Monte Carlo to generate the estimators Вµk (Оё). What combination of these four
estimating functions provides the best estimator of Оё? If we п¬Ѓx the value c = 1,
we have seen that we can generate X(Оё) using
Вё О± в€’1
1
cos(U (1 в€’ О±))
(cos U )в€’1/О± (7.46)
X(Оё) = Оё + sin(О±U )
E

with U1 uniform[в€’ПЂ/2, ПЂ/2] and E standard exponential, independent of U . A
single pair (U, E) permits generating X(Оё) for any Оё. In this case, estimation of
X
в€‚
П†0 (xi в€’ Оё)
Оґk (Оё) = П€k (x, Оё) = k
в€‚Оё i

is a little easier since
X X
П†0 (xi в€’ Оё) = I(|xi в€’ Оё| < k).
k
i i

We tested this routine on data the data below, with n = 21, generated from
a symmetric stable distribution with parameters (0, 1, 1.5).

-3.1890 -2.0491 -1.8185 -1.7309 -1.5403 -1.3090 -1.2752
-0.8266 -0.7154 -0.2706 -0.2646 -0.0478 0.2827 0.9552
1.0775 1.1009 1.6693 1.6752 2.4415 2.4703 5.5746
Table 7.5: Data from the symmetric stable(0, 1, 1.5) distribution

The mean of these values is 0.105 and the median is в€’0.2646. By running
1000 simulations with Оё updated 100 times using a Robbins-Monro procedure
and beginning with Оё estimated by its median, we obtained an estimator of the
parameter
b
Оё = в€’0.0954
with coeп¬ѓcients on the four estimating functions given by

ОІ = (0.3561 0.6554 0.8705 1.0096).

Evidently the last two functions П€3 (x, Оё), П€4 (x, Оё) receive more weight than the
п¬Ѓrst two. The resulting estimator is, of course, not really the maximum likeli-
hood estimator, but it approximates the same to the extent that the functions
R
П†k and П†k can be used to approximate the density function. Convergence of
a Robbins-Monro algorithm here, as is frequently the case, is quite slow. To
some extent this is a necessary payment made for our assumed ignorance about
the form of the distribution and the expected value of the estimating functions.
We are repeatedly approximating the latter using Monte Carlo and the noise so
introduced results in the slow convergence. Here, as elsewhere, there is a price
to pay for ignorance.
7.5. USING HISTORICAL DATA TO ESTIMATE THE PARAMETERS IN DIFFUSION MODELS.371

There are certainly more eп¬ѓcient estimators of the median of a symmetric
a вЂњblack-boxвЂќ estimation of parameters requiring no knowledge of the distribu-
tion beyond the ability to simulate values with a particular parameter value and
a set of vaguely reasonable estimating functions. Arguably, a fairly broad set
of candidate estimating functions together with massive computer power may
ultimately generate Monte Carlo estimators that rival maximum-likelihood es-
timation in eп¬ѓciency without any knowledge of the structure of the distribution
beyond the ability to generate samples from it.

7.5 Using Historical Data to estimate the para-
meters in Diп¬Ђusion Models.
7.5.1 General Ito Processes
Typically a diп¬Ђusion model for п¬Ѓnancial data includes parameters with unknown
values which require estimation. We might, for example, wish to п¬Ѓt a diп¬Ђusion
model like
dr(t) = Вµ(t, r(t))dt + Пѓ(t, r(t))dWt
to historical data on spot interest rates. Here as elsewhere we assume that
the drift and diп¬Ђusion coeп¬ѓcients are such that a solution exists. There are
many choices of the drift and diп¬Ђusion coeп¬ѓcients Вµ(t, r(t)), Пѓ(t, r(t)) leading
to common models such as the Vasicek (1977), the CIR (Cox, Ingersoll, Ross,
1985), the Ho-Lee (Ho and Lee, 1986) model etc. but they all have unspeciп¬Ѓed
parameters Оё that must be either estimated or п¬Ѓt to price data before they can
be used. Suppose for the present our intended use is not the pricing of bonds or
interest rate derivatives, but using the model to simulate interest rate scenarios
for an insurance company. In this case it is the P -measure that is our interest
and so historical values of r(t) are directly relevant. Unknown parameters Оё
may lie in either the drift term Вµ(t, r(t)) or in the diп¬Ђusion Пѓ(t, r(t)) so we will
add the argument Оё to either or both function as required. There is a great deal
of literature dealing with estimation of parameters in a diп¬Ђusion model. We will
largely follow McLeish and Kolkiewicz (1997).
According to the simplest discrete time approximation to the process, the
Euler Scheme, the increments in the process over small intervals of time are
approximately conditionally independent and normally distributed. Suppose
we have already simulated the spot interest rate for integer multiples of в€†t
up to time t = jв€†t. Provided the time increment в€†t is small, if we denote
в€†r = r(t + в€†t) в€’ r(t), we can generate the interest rate at time t + в€†t using
the approximation:

в€†r в€’ Вµ(t, r(t), Оё)в€†t в€ј N (0, Пѓ 2 (t, r(t), Оё))в€†t). (7.47)

Thus, if Оё is a parameter in the drift term only, it can be estimated using
weighted least squares; i.e. by minimizing the sum of the squared standardized
372 ESTIMATION AND CALIBRATION

normal variates. X
wt (в€†r в€’ Вµ(t, r(t), Оё)в€†t)2
minОё
t

where the weights wt are proportional to the reciprocal of the variances w =
1/Пѓ 2 (t, r(t)). The solution to this is obtained by setting the derivative with
respect to Оё equal to zero and solving for Оё :
X в€‚Вµ(t, r(t), Оё)
(в€†r в€’ Вµ(t, r(t), Оё)в€†t) = 0. (7.48)
wt
в€‚Оё
t

and it is not hard to see that this is the maximum likelihood estimator of Оё as-
suming the normal approximation in (7.47) is exact. If the unknown parameter
Оё is in the diп¬Ђusion term only, or in both drift and diп¬Ђusion term, it is also easy
to see that the log likelihood for the normal approximation is

1 X (в€†r в€’ Вµ(t, r(t), Оё)в€†t)2
1
в€’n ln(Пѓ(t, r(t), Оё)) в€’
Пѓ 2 (t, r(t), Оё)
2 2t

with n the number of time points t at which we observed the spot interest rate.
Maximizing this over the parameter Оё results in the estimating function
X в€‚Вµ(t, r(t), Оё) (в€†r в€’ Вµ(t, r(t), Оё)в€†t) в€‚ ln[Пѓ 2 (t, r(t), Оё)] (в€†r в€’ Вµ(t, r(t), Оё)в€†t)2
{ в€’1} = 0
+
Пѓ 2 (t, r(t), Оё) 2Пѓ 2 (t, r(t), Оё)
в€‚Оё в€‚Оё
t

which combines the estimating function for the drift (7.48) with that for the
diп¬Ђusion term, using weights inversely proportional to their variances.
GirsanovвЂ™s Theorem allows us to construct maximum likelihood estimators
also for the continuously observed processes analogous to some of the estimators
above. For example, suppose the parameter Оё resides in the drift term only, so
the model is of the form

dXt = Вµ(t, Xt , Оё)dt + Пѓ(t, Xt )dWt , Xo = x0

Suppose P is the measure on C[0, T ] induced by Xt with X0 = 0 and the
measure P0 on the same space is induced by a similar equation but with zero
drift:
(7.49)
dXt = Пѓ(t, Xt )dWt , X0 = x0 .
Then GirsanovвЂ™s Theorem (see the appendix) asserts that with
Zt Zt Z
1 t Вµ2 (t, Xt , Оё)
Вµ(s, Xs , Оё) Вµ(s, Xs , Оё)
Mt = E( dXs в€’
dWs ) = exp{ ds},
2 2 0 Пѓ 2 (t, Xt )
Пѓ(s, Xs ) 0 Пѓ (s, Xs )
0
(7.50)
under some boundedness conditions, then the Radon Nikodym derivative of P
with respect to P0 is given by MT ,
dP
= MT .
dP0
7.5. USING HISTORICAL DATA TO ESTIMATE THE PARAMETERS IN DIFFUSION MODELS.373

The exponential martingale
Z t
Вµ(s, Xs , Оё)
Mt = E( dWs )
Пѓ(s, Xs )
0

describes the Radon-Nikodym derivative for the processes restricted to [0, t]
and the process Xs appearing in (7.50) is assumed generated from the relation
(7.49). Thus the maximum likelihood estimate of Оё is obtained by maximizing
MT . Setting the derivative of its logarithm equal to 0 results in the likelihood
equation
Z Z
в€‚Вµ(t, Xt , Оё) 1 в€‚Вµ(t, Xt , Оё) 1
dXt в€’ Вµ(t, Xt , Оё)dt = 0
Пѓ 2 (t, Xt ) Пѓ 2 (t, Xt )
в€‚Оё в€‚Оё
Z
в€‚Вµ(t, Xt , Оё) 1
(dXt в€’ Вµ(t, Xt , Оё)dt) = 0
or
Пѓ 2 (t, Xt )
в€‚Оё

Of course if we only had available observations taken at discrete time points
t1 < t2 < . . . rather than the continuously observed process, we would need
to replace the integral by a sum resulting in the same estimating function as
(7.48), namely
X в€‚Вµ(ti , Xti , Оё)
Пѓв€’2 (ti , Xti ) (в€†Xti в€’ Вµ(ti , Xti , Оё)в€†ti ) = 0. (7.51)
в€‚Оё
ti

For a continuous time model of the form

dXt = Вµ(t, Xt , Оё)dt + Пѓ(t, Xt , Оё)dWt

in which unknown parameters Оё are in both the drift and the diп¬Ђusion part of
the stochastic diп¬Ђerential, estimation of the parameter Оё is very diп¬Ђerent than in
the discrete time case. Consider, for example, solving the estimating equation
X
{(в€†Xti в€’ Вµ(ti , Xti , Оё)в€†ti )2 в€’ Пѓ 2 (ti , Xti , Оё)в€†ti } = 0 (7.52)
П€(Оё) =
i

to obtain an estimator of Оё. Since for suп¬ѓciently small time increments в€†ti the
Euler increments are approximately normal, this estimating function is asymp-
totically unbiased as max{в€†ti } в†’ 0. Moreover, for any normal random variable
Z with mean 0 var(Z 2 ) = 2(var(Z))2 so the variance of the estimating function
above is, in the limit,
X
Пѓ 4 (ti , Xti , Оё)(в€†t)2
var(П€(Оё)) в†’ 2
i
X
Пѓ4 (ti , Xti , Оё)(в€†ti )
В· max{в€†ti }
i
i
в†’0
374 ESTIMATION AND CALIBRATION

P
Пѓ4 (ti , Xti , Оё)(в€†ti ) converges to the integral
the convergence to zero since i
Z T
Пѓ 4 (t, Xt , Оё)dt
0

which is п¬Ѓnite, provided for example that the function Пѓ(t, x, Оё) is bounded.
In general the asymptotic variance of the estimator obtained by solving an
equation of the form (7.52) is, under fairly modest regularity assumptions on
the estimating function П€, asymptotic to

var(П€(Оё))
(7.53)
.
в€‚
E 2 [ в€‚Оё П€(Оё)]
в€‚
It is easy to see that E[ в€‚Оё П€(Оё)]is asymptotic to
Z
в€‚Пѓ(t, Xt , Оё)
в€’2 Пѓ(t, Xt , Оё)dt
в€‚Оё

which is typically non-zero if Пѓ depends on Оё. Then (7.53) approaches 0 since
the denominator is asymptotically nonzero. We conclude that the estimator
of Оё determined by the estimating function (7.52) approaches perfection (i.e.
its variance approaches zero) as в€†t в†’ 0 and this it true regardless of how long
or how short the time interval is over which the process has been observed.
Restated, this says that for a continuously observed process, a parameter in
the diп¬Ђusion term (including the volatility parameter) can be perfectly estimated
from an arbitrarily short period of observation. Now if you think this is too good
to be true, you are right, at least in a practical sense. In practice, volatility is
by no means known or estimated precisely, because the diп¬Ђusion model does not
п¬Ѓt on the minute time scale necessary for the above argument to go through.
Far from having nearly perfect knowledge of the volatility parameters, volatility
appears to change rapidly on any time scale for which the diп¬Ђusion model is con-
sidered a good п¬Ѓt. In the next section we discuss various estimates of volatility
obtained from a discretely observed time (geometric) Brownian motion.

7.6 Estimating Volatility
In the last section, we saw that those parameters in the diп¬Ђusion coeп¬ѓcient of
a general diп¬Ђusion were all too easily estimated. In fact from an arbitrarily
short segment of the path, we can, at least in theory, obtain an estimator of
the volatility which is exact (is unbiased and has zero variance) because of the
inп¬Ѓnite variation of a diп¬Ђusion process in these arbitrarily short periods of time.
Information for estimating the diп¬Ђusion coeп¬ѓcient obtains much more rapidly
than for the drift, and in this respect the continuous time processes are quite
diп¬Ђerent than their discrete analogues. Two diп¬Ђusions processes with diп¬Ђerent
diп¬Ђusion coeп¬ѓcients are mutually singular. Intuitively this means that the sam-
ple paths generated by these two measure lie in two disjoint subsets of C[0, T ]
7.6. ESTIMATING VOLATILITY 375

and so that we can theoretically determine from a single sample path the exact
diп¬Ђusion term. Of course to determine which set a path lies in, we need to
estimate or examine the quadratic variation of the process, something which
requires a perfectly graphed sample path and an inп¬Ѓnitely powerful microscope.
The real world is considerably diп¬Ђerent than this idealized situation for several
reasons. First, we never observe a process in continuous time but only at dis-
crete time points that may be close together. Second, diп¬Ђusion processes п¬Ѓt
data on security prices, interest rates and exchange rates only when viewed over
a longer time intervals than a minute, hour, or day. Short-term behaviour is
very diп¬Ђerent; for example they usually evolve through a series of jumps corre-
sponding to trades of varying magnitudes and frequencies. Third, information
obtained from the derivatives market can be used to obtain implied volatili-
ties but these are not the same as theP -measure or historical volatilities in a
discrete-time model. In theory, they should agree in the continuous time Black-
Scholes model, since the risk neutral measure has the same diп¬Ђusion coeп¬ѓcient
as does the P measure, but this is of limited use since volatilities can change
rapidly.
The volatility parameter is the single most important parameter for pricing
short-term options and arguably the most important parameter for virtually all
п¬Ѓnancial time series. In Figure 7.4 we plot the price of a call option on a stock
currently trading at \$10. The strike price of the stock is \$12 and the time to
maturity is 1/2 year. Notice that the option price ranges from close to 0 when
Пѓ = 0.1 to around \$1.40 when Пѓ = 0.7 indicating that diп¬Ђerent estimates
of volatility will have substantial impact on the option price. In fact over a
large part of the range of this graph the relationship between option price and
volatility is nearly linear.
Complicating the estimation of Пѓ is its time-varying behaviour, with periods
of persistent high volatility followed by periods of relatively lower volatility. Fail-
ure to accommodate the behaviour of the volatility process may result in highly
misleading models, substantial pricing errors, and the catastrophic consequences
that can accompany them. We have already distinguished between estimation
of volatility based on historical data and the calibration of the volatility para-
meter to option price data. The latter is required if we are pricing derivatives
since it chooses a value of the volatility parameter that is most consistent with
the observed prices of options and futures. The former is necessary if we are
interested in volatility for the purpose prediction, risk management, or scenario
generation. The implied volatility is obtained from the price of one or more
heavily-traded or benchmark derivatives sold on the open market having the
same price process St for the underlying asset. We determine the volatility
parameter which produces the market price of a given option. For example sup-
pose an option with strike price K, maturity T, and initial value of the stock S0
is traded on the market at a price given by V. Then, because the Black-Scholes
price is increasing in Пѓ, we may solve the equation
(7.54)
BS(S0 , K, r, T, Пѓ) = V
for the implied volatility parameter Пѓ. Because BS(S0 , K, r, T, Пѓ) is monotone
376 ESTIMATION AND CALIBRATION

Figure 7.4: Price of Call option as a function of volatility, S0 = 10, K = 12, r =
0.05, T = 0.5 years.

in Пѓ a solution exists (as long is V is within a reasonable range) and is unique.
This estimate of volatility diп¬Ђers from the historical volatility obtained by com-
puting the sample variance of the returns log(St+1 /St ). Since it agrees with the
market price of a speciп¬Ѓc option, it reп¬‚ects more closely the risk-neutral distri-
bution Q and is therefore used for pricing other derivatives. There are several
disadvantages of this method of calibrating the parameter. The calibrated value
of Пѓ depends on the stock, the strike price of the option, as well as the current
time t, the time to maturity T в€’ t and to some extent on other parameters
that are harder to measure such as the liquidity of the option and the degree to
which it is used as a benchmark. Nevertheless it is common to transform option
prices V to volatility Пѓ using (7.54) and then rather than deal with option prices,
analyze these implied volatilities. This transformation may be justiп¬Ѓed if the re-
sulting model is simpler or smoother as a function of Пѓ. However there are many
monotonic transformations of price that could do a better job of simplifying a
non-Gaussian model.
In this section we will concentrate on the estimation of volatility based on
historical data, as opposed to calibration. An estimator based on historical
data approximates a parameter of the real world measure P whereas the cali-
brated parameter is a property of the risk-neutral measure Q. To explore the
of daily stock price data (Nov 1, 1998 to Nov 1, 2000) for Nortel (NT), listed
on the New York Stock Exchange from the web-site http://п¬Ѓnance.yahoo.com
and on the basis of this, wish to estimate the volatility. We used the вЂњadjusted
7.6. ESTIMATING VOLATILITY 377

Figure 7.5: Nortel Returns, Nov 1, 1998-Nov 1, 2000.

closeвЂќ since these have been adjusted for stock splits and dividends. The daily
returns over this period are graphed in Figure 7.5 and the signiп¬Ѓcant increase
in volatility is evident towards the end of the sequence. Since the logarithm
of daily stock prices is assumed to be a Brownian motion we may estimate the
daily volatility using the sample variance of the п¬Ѓrst diп¬Ђerences in these loga-
rithms. To obtain the variance over a year, multiply by the number of trading
days (around 253) in these years. Thus the annual volatility is estimated by
sqrt(253*var(diп¬Ђ(log(ntprice)))) which gives a volatility of around 0.635.
How does this historical estimate of volatility compare with the volatility as
determined by option prices?
Table 7.5 is a segment of a table of option prices obtained from the Chicago
Board of Options Exchange and provides the current price (on November 2,
2000) of calls and puts on NT. There was a January \$40 call selling for \$8 5 8
and a \$40 January put for \$4 1 . The price of the stock when these options prices
8
were recorded was \$44 (evidently the вЂњgood old daysвЂќ!)

Calls Last Sale Net Bid Ask Volume Open Interest

01 Jan 40 (NT AH-E) 8 5/8 -7/8 8 3/8 8 7/8 79 10198
01 Jan 45 (NT-AI-E) 6 1/8 -1/4 5 7/8 6 1/4 460 4093
Puts Last Sale Net Bid Ask
01 Jan 40 (NT MH-E) 4 1/8 +1/4 3 7/8 4 1/4
01 Jan 45 (NT-MI-E) 6 3/8 +1/4 6 1/8 6 5/8
Table 7.5: Option Prices for Nortel, Nov 2, 2000.

Suppose the current interest rate (in US dollars since these are US dollar
prices) is 5.8%. This is roughly the interest rate on a short term risk free deposit
like a treasury bill. Then the implied volatility is determined by п¬Ѓnding the value
378 ESTIMATION AND CALIBRATION

of the parameter Пѓ (it turns out to be around 0.79) so that the Black-Scholes
formula gives exactly this value for an option price, i.e. п¬Ѓnding a value of Пѓ so
that PUT=4.125 and CALL=8.625, for example:
[call,put]=blsprice(44,40,.058,55/253,.79,0) ( returns call =8.6077, put =4.1066)
The implied volatility of a roughly at the money option is about Пѓ = 0.79
and this roughly agrees with the price of the January 45 options
[call,put]=blsprice(44,45,.058,55/253,.79,0) (returns call =6.2453, put =6.6815)
Diп¬Ђerences between the observed and theoretical option prices of the mag-
nitude observed here are inevitable. Transaction cost and the fact that some
options are more liquid than others can aп¬Ђect option prices. If we ignore this
problem, we can п¬Ѓnd a pair of values (r, Пѓ) by minimizing the sum of squared
pricing error between the observed and Black-Scholes price of the put and call
options, but because short-term option prices are quite insensitive to r, the
estimates of r determined in this way can be unreliable.
We now have two estimates of volatility, historical volatility (0.635) and
implied volatility (0.79). Which volatility is вЂњcorrectвЂќ? The answer is both.
The market conditions for this company changed enough over this period and
subsequently to eп¬Ђect enormous changes in volatility and if we used a period of
less than 2 years, our estimate of historical volatility would be larger (the one-
year estimated volatility is 0.78, very close to the implied volatility). Moreover
the implied volatility is a property of the Q measure, and this is determined not
only by the recent history of the stock, but also investor risk preferences and
fears. The Q measure is a distribution for stock prices assigned by the market
for options. The P -measure was assigned by investorвЂ™s past tastes for the stock,
sometimes in a substantially diп¬Ђerent economic climate. It is often the case that
вЂњimplied volatilitiesвЂќ are greater than the historical ones, although in theory for
a Geometric Brownian motion process which allows frictionless continuous-time
hedging, the risk-neutral Q and the actual distribution P should have the same
volatilities.

7.6.1 Using Highs and Lows to estimate Volatility
There are many competing estimators of the volatility parameter of a geometric
Brownian motion and to date we have used only one, the sample variance of
stock returns. To set the stage for these estimators, suppose S(t) is a geometric
Brownian motion
d ln(S(t)) = Вµdt + ПѓdWt , S(0) = S0
and we observe certain properties of non-overlapping segments of this process,
for example days or weeks. In particular suppose t1 < t2 < tn .. are the begin-
nings of certain observation periods of length в€†i where ti +в€†i В· ti+1 and so we
can deп¬Ѓne, for this period, the open, the close, the high and the low respectively

Oi = S(ti ), Ci = S(ti + в€†i )
Hi = max{S(t); ti В· t В· ti + в€†i },
Li = min{S(t); ti В· t В· ti + в€†i }.
7.6. ESTIMATING VOLATILITY 379

The observation periods may or may not be assumed equal in length (for example
1
daily in which case в€†i is around 252 for all i). We will sometimes assume that
the market does not move outside of these observation periods, in which case
Oi = Ciв€’1 since in practice the open of many stocks at the beginning of the day
is nearly identical to the close at the end of the previous day.
The simplest estimator of Пѓ is that based on the variance of returns

var(Ri ), where Ri = ln(Ci /Oi ).

Since Ri is Normal(Вµв€†i , Пѓ 2 в€†i ), we can construct a regression problem towards
estimating the parameters Вµ, Пѓ
p
R
в€љ i = Вµ в€†i + Оµi , where Оµi are i.i.d. N (0, Пѓ2 ).
в€†i
By regarding the left side as the response in an ordinary least squares re-
gression, we obtain the standard estimators
Pn
i=1 Ri
Вµ = Pn
b (7.55)
в€†i
i=1
n
1 X1
2
(Ri в€’ Вµв€†i )2 ,
b b (7.56)
ПѓOC =
n в€’ 1 i=1 в€†i

both unbiased estimators of their respective parameters. Regression theory also
provides the variance of these estimators:

Пѓ2
var(b) = Pn
Вµ ,
i=1 в€†i
2Пѓ 4
Пѓ2
var(bOC ) = .
nв€’1
b2
There are more eп¬ѓcient estimators than ПѓOC if we include not only an open
and a close but also the high and the low for those periods. In Chapter 5,
we discovered that for a Geometric Brownian motion, the random variables
ZHi = log(Hi /Oi ) log(Hi /Ci ) and ZLi = log(Li /Oi ) log(Li /Ci ) are both expo-
nentially distributed with expected value Пѓ 2 в€†i /2 and each of ZLi and ZHi are
independent of the open Oi and the close Ci . This auxiliary information can
be used in addition to obtain an unbiased estimator of the parameter Пѓ2 . For
example consider the estimator
n
2 X ZHi
b2
ПѓHH =
n i=1 в€†i

which, in view of the exponential distribution of ZH is unbiased and has variance

Пѓ4
Пѓ2
var(bHH ) = ,
n
380 ESTIMATION AND CALIBRATION

about half of the variance based on returns. Of course we can build a similar
estimator using the lows,
n
2 X ZLi
2
b
ПѓLL = .
n i=1 в€†i

b2 b2
One might guess that an estimator which simply averages ПѓHH and ПѓLL would
be four times as good as (7.56) but in fact because of the negative correlation
between the two estimators being averaged it is better still! The average is the
Rogers and Satchell (1991) estimator
n
1 X ZHi + ZLi
b2 (7.57)
ПѓRS =
n i=1 в€†i

which has variance depending on the correlation coeп¬ѓcient П‚ ' в€’0.338 between
ZHi and ZLi
2 ZHi
Пѓ2
var(bRS ) = var( )(1 + П‚)
n в€†i
Пѓ4
' (1 в€’ 0.338)
2n
Пѓ4
' 0.331 .
n
Evidently the Rogers and Satchell estimator is around 6 times as eп¬ѓcient (one
sixth the variance) as the usual estimator or volatility (7.56). In fact it is
independent of (7.56) as well and so we can combine the two with weights
inversely proportional to the estimator variances to get a best linear unbiased
estimator with weights rounded,
12 62
b2
ПѓBLU ' b b
ПѓOC + ПѓRS
7 7
and this estimator has variance

Пѓ4
Пѓ2 в‰€ 0.284
var(Л†BLU )
n
around one seventh of (7.56). Using a single high, low, open, close for a seven day
period to estimate volatility is roughly equivalent to using daily returns and the
estimator (7.56). Related estimators have been suggested in the literature. See
for example Parkinson(1980). Garman and Klass (1980) suggest the estimator

1
Л†2 (log(Hi /Li ))2 в€’ (2 ln(2) в€’ 1)(log(Ci /Oi ))2 (7.58)
ПѓGK =
2
b2
which is similar to ПѓBLU .
We could quibble over which of the above estimators to use, based on extra-
ordinarily precise eп¬ѓciency calculations for a geometric Brownian motion but
7.6. ESTIMATING VOLATILITY 381

much of this exercise would be wasted. Most indeed improve eп¬ѓciency over
(7.56) but not quite to the extent predicted by theory and provided that we use
the extremes, the diп¬Ђerences between those estimators which do use them are
relatively small. To demonstrate this we compared them on the Dow Jones In-
dustrial average for the period January 1999 to May 20, 2004. We computed all
of the estimators for daily data on the open, close, high, and low and then plot-
ted a 21-day moving average of these estimators against time. For example two
b b
of the estimators, ПѓOC and ПѓHH are given in Figure 7.8. The other estimators
are not plotted since they closely resemble one of these two. The curve labelled
b
вЂњintraday volatilityвЂќ measures the annualized volatility as determined by ПѓHH
b
and that labelled вЂњinterday volatilityвЂќ, ПѓOC . The estimators when evaluated on
the whole data set provide values as follows:

b b b b b b
Пѓ OC = 0.18 ПѓHH = 0.29 ПѓLL = 0.31 ПѓRS = 0.30 ПѓBLU = 0.29 ПѓGK = 0.31

b
indicating only minor diп¬Ђerences between all of the estimators except Пѓ OC .
Over this period January 1999 to May 20, 2004, the intraday volatility was
consistently greater than the interday volatility, often by 50% or more, and there
is a high correlation among the estimators that use the highs and lows as well as
the open and close. This indicates a profound failure in the Geometric Brownian
motion assumption for the Dow Jones Industrial index, and this discrepancy
between two estimators of volatility is much greater than for any other index
I have investigated. Since we have seen substantial diп¬Ђerences in the variances
of the estimators based on the geometric Brownian motion model, it is worth
comparing the estimators for their empirical variance. However since the п¬Ѓrst,
b
Пѓ OC appears to have a diп¬Ђerent mean, we will compare them after normalizing
by their mean: i.e. we compare values of

sqrt(var(b2 ))
Пѓ
CV =
E(b2 ))
Пѓ

as estimated from the 21-day moving averages. These values are given below

CVOC = 0.83 CVHH = 0.53 CVLL = 0.59 CVRS = 0.54 CVBLU = 0.55 CVGK = 0.55

and again, except for the п¬Ѓrst, they are virtually indistinguishable. What hap-
b b
pened to the gains in eп¬ѓciency we expected when we went from ПѓHH to ПѓBLU ?
These gains are, in part at least, due to the negative correlation в€’0.338 between
ZHi and ZLi but for this data, they are positively correlated and in fact the
21-day moving averages of ZHi and ZLi have correlation around 0.90. The
only explanation that seems reasonable is that the DJA is quite far from a Geo-
metric Brownian motion. If the volatility were stochastic or time-dependent,
as most believe it is, then this can account for the apparently high correlation
between ZHi and ZLi . For example suppose that the volatility parameter Пѓi
was stochastic (but constant throughout the day) and possibly dependent on i.
382 ESTIMATION AND CALIBRATION

Figure 7.6: The construction of an Eggtimer plot

Then since

cov(ZHi , ZLi ) = E[cov(ZHi , ZLi |Пѓi )] + cov(E[ZLi |Пѓi ], E[ZLi |Пѓi ])
2
Пѓi в€†i
= E[cov(ZHi , ZLi |Пѓi )] + var( )
2
it is possible for the п¬Ѓrst term to be negative (as predicted by a geometric
Brownian motion model) and yet the second term is suп¬ѓciently large that the
covariance itself is large and positive.
Thus we have three estimators of the volatility parameter,

ln(C/O)}2 2 ln(L/O) ln(L/C) 2 ln(L/O) ln(L/C)
{ }.
, ,
в€†t в€†t в€†t
While the п¬Ѓrst is independent of the other two given O, unfortunately the second
and third are themselves not uncorrelated. In order to weight them optimally
we need some information about their joint distribution. It follows that both
{ln(C/O)}2 /в€†t and (ZH + ZL )/в€†t provide unbiased estimators of the volatility
parameter Пѓ 2 and indeed the latter is independent of the former.
These estimators are areas illustrated in Figure 7.6. Consider the plot cor-
responding to time t. The vertical scale is logarithmic so that logs are plotted.
This plot is constructed using an arbitrarily chosen angle Оё from the four values
(O, C, H, L) using two lines `1 , `2 through the point (t, 1 ( ln(O)+ ln(C))) with
2
7.6. ESTIMATING VOLATILITY 383

Figure 7.7: Eggtimer Plot for the Dow Jones Average, Feb 1-March 27, 2000.

slopes В±tan(Оё). Horizontal lines are drawn at the ordinate values ln(H), ln(L),
ln(O), ln(C) and using the points where ln(O)and log(C) strike the two lines as
corners, a rectangle is constructed. The area of this rectangle tan(Оё)( ln(C/O))2
is an unbiased estimator of tan(Оё)Пѓ 2 в€†t provided the Brownian motion has no
drift. The second region consists of вЂњwingsвЂќ generated by the four points at
which the horizontal line at ln(H), ln(L) strike the lines `1 , `2 . The total area of
this region (both wings) is tan(Оё)(ZL +ZH ) which is another unbiased estimator
of tan(Оё)Пѓ 2 T independent of the п¬Ѓrst, and also independent of whether or not
the underlying Brownian motion has drift. By comparing these areas, we can
detect abnormal changes in the volatility, or changes in the drift of the process
that will increase the observed value of ( ln(C/O))2 while leaving the second
estimator unchanged. Because each estimator is based only on a single period,
it is useful to provide as well a plot indicating whether there is a persistent
change in either or both of the two estimators of volatility. Related estimators
have been suggested in the literature.
We also show empirically the eп¬Ђectiveness of incorporating the high low close
information in a measure of volatility. For example, the plot below gives the
eggtimer plot for the Dow Jones Industrial Index for the months of February
and March 2000. The vertical scale is logarithmic since the Black Scholes model
is such that the logarithm of the index is Brownian motion. A preponderance
of black rectangles shows periods when the drift dominates, whereas where the
grey tails are larger, the volatility is evidenced more by large values of the high
or small values of the low, compared to the daily change.
The 21-day rolling sum of the areas of the regions, either grey or black,
is graphed in Figure 7.8 and provides two measures of volatility. The rolling
sum of the grey areas is called the вЂњintraday volatilityвЂќ and that of the black
384 ESTIMATION AND CALIBRATION

Figure 7.8: 21-day average volatility estimates for the Dow Jones Average, 1999-
May 20, 2004

rectangles, the interday volatility. In the absence of substantial drift, both
measure the same theoretical quantity but they diп¬Ђer in this case.
This diп¬Ђerence is equally evident from Figure 7.9, the plot of the cumulative
variance for the same period of time. Of course in this plot, it is the slope not
the level which indicates the variance.
Consistent diп¬Ђerences between the intra-day and the inter-day volatility or
variances would be easy to explain if the situation were reversed and the inter-
day volatility were greater, because one could argue that the inter-day measure
contains a component due to the drift of the process and over this period there
was a signiп¬Ѓcant drift. A larger intraday volatility is more diп¬ѓcult to explain
unless it is a failure of the Black-Scholes model. In this case, one might expect
a similar behaviour in another market. If we generate a similar plot over the
identical period of time for the NASDAQ index (Figure 7.10) we п¬Ѓnd that the
comparison is reversed. This, of course, could be explained by the greater drift of
the technology dependent NASDAQ (relative to its volatility) compared to the
relatively traditional market of the Dow Jones but other indices we compared
were more like the NASDAQ here than like the DJA.
There is no doubt that this diп¬Ђerence reп¬‚ects a greater intraday range for the
DJA than other indices. In fact if we plot the cumulative value of the range of
the index divided by the close (H в€’ L)/C as in Figure 7.11, it conп¬Ѓrms that the
daily range as measured by this ratio is consistently smaller for the NASDAQ
7.6. ESTIMATING VOLATILITY 385

Figure 7.9: Cumulative Variances of the Dow Jones Average

Figure 7.10: Cumulative Variances for the NASDAQ index
386 ESTIMATION AND CALIBRATION

Figure 7.11: Comparison of the cumulative variance of the Dow Jones and the
NASDAQ, Jan 1999-July 2000

than for the Dow Jones for this period.
Although high, low, open, close data is commonly available for many п¬Ѓnan-
cial time series, the quality of the recording is often doubtful. When we used
older data from the Toronto Stock Exchange, there were a number of days in
which the high or low were so far from open and close to be explicable only as
a recording error (often the diп¬Ђerence was almost exactly \$10). When the data
on highs and lows is accurate, there is substantial improvement in eп¬ѓciency
and additional information available by using it. But there is no guarantee
that published data is correct, particularly old data. A similar observation on
NYSE data is made by Wiggins (1991); вЂњIn terms of the CUPV data base itself,
there appear to be a number of cases where the recorded high or low prices are
signiп¬Ѓcantly out of line relative to adjacent closing pricesвЂќ.

7.7 Estimating Hedge ratios and Correlation Co-
eп¬ѓcients
The correlation coeп¬ѓcient between two asset prices is important not only be-
cause it indicates a degree of dependence between the two assets but because it
is a required component for many practical investment decisions such as optimal
portfolio selection, risk management, and hedging. There is no perfect hedge
(except apparently in a Japanese garden) and so in practice we are required to
use one asset to hedge another (hopefully highly correlated) asset. These may
include derivatives or similar investments (for example bonds on the same or
related underlying) which react similarly to market conditions.
7.7. ESTIMATING HEDGE RATIOS AND CORRELATION COEFFICIENTS387

Suppose we wish to hedge one investment, say in stock 2 using another, stock
1. As before, assume that we have data on the high, low, open and close of both
stocks over non-overlapping time intervals (ti , ti+в€†i ), i = 1, 2, ..., n. Denote the
high, low, open and close for stock j on time interval i by (Hij , Lij , Oij , Cij ), i =
1, .., n , j = 1, 2. Again we will denote the returns by
Rij = ln(Cij /Oij ).
Then under the Geometric Brownian motion assumption, the return vector
2 2
(Ri1 , Ri2 ) has a bivariate normal distribution with variances Пѓ1 в€†i , Пѓ2 в€†i and
correlation coeп¬ѓcient ПЃ.
If we assume that we are able to rehedge our portfolio at the beginning of
each time interval, and at the beginning of interval i we are long 1 unit worth
of stock 2 and short hi units worth of stock 1, then our total return at the end
of the iвЂ™th period is Ri2 в€’ hi Ri1 . The optimal hedge ratio is the value of hi
which minimizes the variance of Ri2 в€’ hi Ri1 and this is given by

cov(Ri2 , Ri1 ) Пѓ2
=ПЃ .
hi =
var(Ri1 ) Пѓ1
Note that hi is a constant independent of time, at least over periods when the
stock volatilities and correlation coeп¬ѓcients remain unchanged. While implied
volatilities Пѓ1 ,Пѓ2 may be obtained from derivative prices for each of these assets,
the correlation parameter ПЃ is unknown and, unless there is a traded option such
as a spread option whose value depends speciп¬Ѓcally on this correlation, ПЃ needs
to be estimated from historical data. The simplest estimator of the correlation
coeп¬ѓcient is the sample covariance of the returns,
P
(Ri2 в€’ R2 )(Ri1 в€’ R1 )
ПЃC = cor(Ri2 , Ri1 ) = qP i
c (7.59)
Л† P
i (Ri2 в€’ R2 ) i (Ri1 в€’ R1 )

c
where cor denotes the sample correlation coeп¬ѓcient. By a common argument
( see for example Anderson (1958, Theorem 4.2.6) this has asymptotic variance
1
(1 в€’ ПЃ2 )2 .
n
Here, as in McLeish(2004), we will consider using historical data for (Hij , Lij , Oij , Cij ), i =
1, .., n , j = 1, 2 to estimate ПЃ. In the case of two or more correlated geometric
Brownian motion processes, the joint distributions of highs, lows and closing
values is unknown, and so we will need to revert to a simpler alternative than
maximum likelihood estimation. We have seen that in the Black-Scholes model,
the statistics

ZHi1 = ln(Hi1 /Oi1 ) ln(Hi1 /Ci1 )
ZHi2 = ln(Hi2 /Oi2 ) ln(Hi2 /Ci2 )
ZLi1 = ln(Li1 /Oi1 ) ln(Li1 /Ci1 )
ZLi2 = ln(Li2 /Oi2 ) ln(Li2 /Ci2 )
388 ESTIMATION AND CALIBRATION

1.4
1/3
Distribution of Z
Normal approximation

1.2

1

0.8

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3

Figure 7.12: The Normal approximation (dashed line) to the distribution of
Z 1/3 where Z is exponential.

all have marginal exponential distributions and each is independent of the open
and close for the iвЂ™th interval.
Suppose we transform each of the above exponential random variables with
some function g and assume that we can determine the correlation coeп¬ѓcients
cor(g(ZH1 ), g(ZH2 )) = b(ПЃ) as a function of ПЃ. For simplicity assume that we
have subtracted the mean and divided by the standard deviation to provide
a function g such that E{g(ZH1 )} = 0, var{g(ZH1 )} = 1. There are various
possibilities for the transformation g, the simplest being a standardized power
p
p p p p
g(ZH1 ) = (ZH1 в€’ E(ZH1 ))/ var(ZH1 ) for some suitable value of p > 0. A
transformation of the gamma distributions in general and the exponential dis-
tribution in particular that make them very nearly normal is the cube root
transformation (p = 1/3) (see Sprott, 2000, Chapter 9, Appendix).
For an exponential random variable ZH there is only a slight diп¬Ђerence
between the graph of the probability density function of

1/3 1/3
ZH в€’ E(ZH )
q (7.60)
1/3
var(ZH )

and the graph of the standard normal probability density function, this diп¬Ђer-
ence lying in the left tail (see Figure 7.12), the region in which we are least
interested in the maximum since it is very close to the open or close. We re-
placed the theoretical mean and variance in (7.60) by the sample mean and
7.7. ESTIMATING HEDGE RATIOS AND CORRELATION COEFFICIENTS389

variance and so the function used ultimately was

1/3 1/3
ZHi в€’ ZH
g(ZHi ) = q .
1/3
d
v ar(ZH )

Although it is theoretically possible to obtain the function b(ПЃ), the exact
expression requires the evaluation of Bessel functions. For simplicity we п¬Ѓt
polynomials of varying degrees. For example the polynomial of degree 9

b(ПЃ) в‰€ .4822ПЃ9 +.2102ПЃ8 в€’.9629ПЃ7 в€’.3104ПЃ6 +.7006ПЃ5 +.1887ПЃ4 в€’.1288ПЃ3 +.1822ПЃ2 +.6385ПЃ+.0008

is accurate to within 1%.
Inverting this approximation to b(ПЃ) provides a tractable estimator of the
correlation between stocks based only on the correlation between the marginally
exponential statistics ZHi , ZLi . This estimator is
1 1/3 1/3 1/3 1/3
ПЃHL = bв€’1 ( (cor(ZH1 , ZH2 ) + cor(ZL1 , ZL2 )))
c c (7.14)
Л†
2
1/3 1/3
A similar estimator obtains as well from the cross terms since corПЃ (ZH1 , ZH2 ) =
b(в€’ПЃ).
1 1/3 1/3 1/3 1/3
ПЃ2 = в€’bв€’1 ( (cor(ZH1 , ZL2 ) + cor(ZL1 , ZH2 )))
c c (7.15)
Л†
2
c
where again cor denotes the sample correlation, but this estimator is very inef-
п¬Ѓcient and adds little to our knowledge of the correlation. We will not consider
it further. Since all that is required is the inverse of the function b, we can
use a simple polynomial approximation to the inverse without sacriп¬Ѓcing much
precision, for example

bв€’1 (c) в‰€ 0.1567c5 в€’ 0.5202c4 + 0.6393c3 в€’ 0.8695c2 + 1.5941c.

The contenders for reasonable estimators are ПЃHL , ПЃC , or some combination such
Л† Л†
as an average, the simplest being a straight average of these two
1 1
ПЃAV =
Л† ПЃC + ПЃHL .
Л† Л†
2 2
It remains to see how these estimators perform in practice both on real and
simulated data. First we simulated one yearвЂ™s worth of daily observations on
two stocks, simulating the High, Low, Open, and Close for each day. There is
considerable beneп¬Ѓt to using both of the estimators ПЃC and ПЃHL at least for
Л† Л†
simulated data. For example in Figure 7.13 we graph the two variances var(Л†C )
ПЃ
and var(Л†AV ) for various (positive) values of ПЃ. Notice that the estimator ПЃAV
ПЃ Л†
which combines information from all of the high, low, open, close has variance
roughly one half of that of ПЃC which uses only the values of the open and the
Л†
close.
390 ESTIMATION AND CALIBRATION

Figure 7.13: Variance of Estimators ПЃC and ПЃAV as a function of ПЃ.
Л† Л†

We also use these methods to investigate the correlation between two processes.
To this end we downloaded data for both the S&P500 index and Nortel (NT)
over the period June 17, 1998-June 16, 2003. This was a period when Nortel
rose and fell sharply showing increasing volatility. In Figure 7.14, we plot the
two measures of the correlation ПЃAV and ПЃC between the returns for these two
Л† Л†
series in moving windows of 63 days (approximately 3 months) over this period.
Note that the estimator ПЃAV , indicated by a dotted line, tends to give lower
Л†
correlation than ПЃC since it includes a component due to the movements within
Л†
days and these have lower correlation than the the movements between days.

7.8 Problems
1. Suppose Y1 and Y2 are independent normal random variables with pos-
2 2
sibly diп¬Ђerent variances Пѓ1 , Пѓ2 and expected values E(Yi ) = Вµi , i = 1, 2.
Show that the conditional distribution of Y1 given Y1 + Y2 = S is Normal
with mean
Вµ1 + w(S в€’ Вµ1 в€’ Вµ2 )
and variance
2
Пѓ1 (1 в€’ w)
where
2
Пѓ1
w= 2 2.
Пѓ1 + Пѓ2
2. Consider data generated from the mixture density
2 2
pN (Оё1 , Пѓ1 ) + (1 в€’ p)N (Оё2 , Пѓ2 )
7.8. PROBLEMS 391

0.8
ПЃ
C
ПЃ
AV
0.7

0.6

0.5

0.4
correlation

0.3

0.2

0.1

0

-0.1
1999 2000 2001 2002 2003

Figure 7.14: Two Measures of Correlation between Nortel (NT) and the S&P500
Index

where Оё1 < Оё2 (so that the parameters are identiп¬Ѓable). Write a pro-
 << стр. 2(всего 3)СОДЕРЖАНИЕ >>