. 12
( 19)


the values has to be entered in the corresponding text box.20 The system noise variances
can also be set to a multiple of the observation noise variance (option A). In this case,
the user sets the desired signal to noise ratio in the edit box. An alternative way is to let
the system estimate the optimal values of the parameters from the data. The estimation

Signal to noise ratio = 1 Actual beta
Estimated beta
Confidence bands: ± 1 SE

1 26 51 76 101 126 151 176

Figure 7.9 Actual versus estimated beta. The system noise to observation noise ratio, Q/R, is 1.
This ratio allows the Kalman ¬lter to rapidly adapt to the jump in sensitivity. However, this quick
reaction comes at a cost of increased volatility in the beta estimates and of a very large standard
Unless the user has a good prior knowledge of the values to be entered in these cells (on the “Example”
sheet, range “F2:F5”), we do not recommend using this option.
Time-Varying Factor Sensitivities 231

Signal to noise ratio = 0.1 Actual beta
Estimated beta
Confidence bands: ± 1 SE

1 26 51 76 101 126 151 176

Figure 7.10 Actual versus estimated beta. Q/R is now 0.1. The beta estimates are smoother and
their standard errors smaller. However, the model seems to adapt more slowly to the sensitivity

Signal to noise ratio = 0.02 Actual beta
Estimated beta
Confidence bands: ± 1 SE

1 26 51 76 101 126 151 176

Figure 7.11 Actual versus estimated beta. For Q/R = 0.02, beta estimates are very smooth and
their standard errors are much smaller. This is because variations in co-movements of Xt and Yt
are attributed to ¬‚uke rather than to true variations in beta

procedure is described later in the chapter. Figure 7.13 shows the results of the random
walk model for a signal to noise ratio of 1.
While models 1 and 2 use values for the R and Q variances that are constant over
the data set, model 3 (“Intervention analysis”) enables users to apply their own time-
varying R and Q values. This is achieved by specifying the range containing additive
noise variances (i.e. R and Q time series). This enables the user to force the model to
232 Applied Quantitative Methods for Trading and Investment

Figure 7.12 The stochastic parameter regression menu of the tool pack

Actual Sensitivities Alpha X1 X2




Figure 7.13 Stochastic parameter regression example. The underlying sensitivities are assumed
to follow a random walk (model 1) and the signal to noise ratio is set to 1

rapidly adapt to important events (by increasing Q during this period) or, on the contrary,
to make the model insensitive to irrelevant events (by increasing R during this period).
For instance, if the system noise variance Q is increased for variable X2 at time t = 52
(cell “Q54” is set to a large number (e.g. 50)), the model adapts rapidly to the level shift
in β2 and models almost perfectly the actual factor sensitivities. The result of this model
is displayed in Figure 7.14.21

Model 3 should only be used to force the model to take prior knowledge into account. In general columns
N to Q in the “Example” sheet should be set to zero unless the user wants to modulate the adaptation speed
of the model by increasing the credibility attributed to some observations (by putting a positive number for the
additive Q values) or by decreasing this credibility (by putting a positive number for the additive R value).
The values to be entered are discretionary and depend on the user™s perception of the event™s importance. The
bigger the value the more it will alter the model™s adaptation speed for the corresponding period.
Time-Varying Factor Sensitivities 233

Actual Sensitivities Alpha X1 X2




Figure 7.14 Example of the stochastic parameter regression using model 3. The underlying sen-
sitivities are assumed to follow a random walk with system noise added at t = 52 (time of level
shift in b2 ). The signal to noise ratio is still set to 1. The ¬t is almost perfect as the model is
forced to adapt to the level shift

The reader should now be familiar with the functioning of the Kalman ¬lter. This
adaptive procedure optimally computes the current state of the model (i.e. the factor sen-
sitivities and their covariance matrix) given the previous state and some hyper-parameters
such as system and observation noise variances. Two important questions remain unan-
swered: (i) how do we determine the initial state of the model?, (ii) how do we best set
the values for the hyper-parameters?
The initial distribution of the state vector (mean s and covariance P0 ) can be set
in several ways. The simplest is to use prior knowledge. For instance, one can start
with a prior value of 1 for a stock™s beta, and a prior value of 0 for the sensitivity of
the return to exchange rates. A popular alternative is to estimate sensitivities (and their
covariance matrix) from a subsample of the data by OLS. However, this may use too
many observations from the data sample and we have already shown the shortcomings
of OLS estimation in cases where sensitivities are not constant. The model provided in
the tool pack uses a diffuse prior (see Harvey (1989)). A diffuse prior corresponds to an
in¬nite covariance matrix P0 (i.e. P’1 = 0) so that no credibility is given to the initial
state estimate, letting the Kalman ¬lter estimate the states from the data. Practically,
however, a large initial covariance matrix is used rather than an in¬nite one. P0 is given
by equation (7.36):
P0 = κI (7.36)

κ being equal to a large but ¬nite number (e.g. 100 000). In the case of a diffuse prior,
the mean of the initial state distribution, s, is irrelevant as long as it is ¬nite. We use a
value of zero for all state variables. The use of a diffuse prior can be viewed as a way to
let the ¬lter estimate the initial values from the ¬rst observations.
Because a diffuse prior is used, little information needs to be supplied by the user. Once
the time series model for the sensitivities and the values for the hyper-parameters , Q,
R, d and c have been chosen, the adaptive model optimally estimates the sensitivities.
The values for these parameters can be set by the user but they can also be estimated
234 Applied Quantitative Methods for Trading and Investment

from the data. The model provided in the tool pack supports an optimisation algorithm
in order to ¬nd the “best” parameters.22
Given the hyper-parameters, the Kalman ¬lter derives recursive values for the factor
sensitivities and their covariance. Hence, the model provides predictions of the dependent
variable and residuals can be computed, conditional on the hyper-parameter values. A
suitable maximisation routine can then be used to maximise the likelihood of observing
the data.
Maximum likelihood estimation consists of ¬nding an estimate of the unknown par-
ameter vector θ such that it maximises the likelihood of generating the data that were
actually observed. In other words, given the sample of observations Yt , ¬nding a solution
for θ that maximises the joint density probability function L(Y, θ ). As observations are
not independent in a time series model, the joint density function is not the usual product
of probability density function corresponding to each observation. Instead, conditional
probability density functions are used to write the joint density function:

L(Y, θ ) = p(Yt |Y1 , Y2 , . . . , Yt’1 ) (7.37)

The disturbances and the initial state vector following a multivariate normal distribution by
assumption, the conditional density functions p(Yt |Y1 , Y2 , . . . , Yt’1 ) are also normal. The
means Yt|t’1 and variances ft|t’1 of these normal distributions are given by the Kalman
¬lter equations (7.26) and (7.28). The model having Gaussian residuals et , the logarithm
of the likelihood function L(Y, θ ) “ i.e. the log-likelihood function log L(Y, θ ) “ can be
written in terms of the prediction errors et and their variances ft|t’1 that have been de¬ned
in equations (7.27) and (7.28):

T 1 1
log L(θ |Y1 , Y2 , . . . , YT ) = ’ ln(2π) ’ ln(ft|t’1 ) ’ (7.38)
2 2 2
t=1 t=1

The maximum likelihood estimate θ of the parameter vector θ is the value of θ that
maximises log-likelihood, i.e.:

θ = arg max log L(θ |Y1 , Y2 , . . . , YT ) (7.39)

The Kalman ¬lter algorithm and equation (7.38) are used to compute the log-likelihood
for a given value of θ . A numerical optimiser is employed to ¬nd the vector θ of hyper-
parameters that maximises the log-likelihood function.
The log-likelihood function in equation (7.38) is a non-linear function of the hyper-
parameters and, unfortunately, there is no analytical solution θMLE to the equation

‚ log L
=0 (7.40)

“Best” according to criteria such as prediction mean square error or a likelihood function.
Time-Varying Factor Sensitivities 235

Given values of the hyper-parameters
q = (¦, d, ct , Rt , Qt)

Run the Kalman filter for t = 1,¦, T
Compute the model™s residuals et

Choose new
values of q Compute the log-likelihood function
log (q) conditional on the values of
that increase
the unknown hyper-parameter vector q
log (q)

log (q) = maximum?



Figure 7.15 The maximum likelihood estimation procedure

A numerical optimisation method for ¬nding the maximum likelihood parameters is there-
fore employed. This iterative procedure is represented in Figure 7.15.
The critical stage of this procedure is the algorithm used to determine the parameter
vector that maximises log-likelihood. There are numerous methods for maximising such
a non-linear function. However, they are unequal in terms of convergence speed and
dependence on initial parameter values. The optimiser used by the tool pack is based on
a quasi-Newton method (see for instance Greene (1977, p. 204)). It usually converges to
a minimum within a few iterations.
By checking the “Estimate parameters” check box, the software optimises the R and
Q parameters and saves them in the range indicated in the text box. The “Estimation
option” button enables the user to select the optimisation criterion (Figure 7.16), mean
squared error or log-likelihood.

Figure 7.16 The optimisation menu for the stochastic parameter regression. System and
observation noise variances can be estimated from the data
236 Applied Quantitative Methods for Trading and Investment

Actual Sensitivities Alpha X1 X2




Figure 7.17 This is the result of the stochastic parameter regression using model 3 with opti-
mised parameters. The underlying sensitivities are assumed to follow a random walk with system
noise added at t = 52 (time of level shift in b2 ). The R and Q values have been computed by
maximum likelihood estimation

Once the hyper-parameters of the model are estimated, the procedure runs the Kalman
¬lter one last time to compute the sensitivities. The R and Q values are saved on the
spreadsheet. The result of the optimised model is displayed in Figure 7.17.
Given the optimised hyper-parameters, the stochastic parameter model can be expressed

Y (t) = ±(t) + β1 (t)X1 (t) + β2 (t)X2 (t) + µ(t)
where ±(t) = ±(t ’ 1) + ·1 (t), Var(·1 ) = 1.11E-04 (cell “F3”)
β1 (t) = β1 (t ’ 1) + ·2 (t), Var(·2 ) = 6.02E-04 (cell “F4”)
β2 (t) = β2 (t ’ 1) + ·3 (t), Var(·3 ) = 8.73E-08 (cell “F5”)
Var(µ) = 1.30E-08 (cell “F2”)
For instance for t = 50, we have:
Y = ’0.19 + 0.11X1 + 0.50X2 + µ (cells J52:L52)

The reader will ¬nd it a useful and rewarding exercise to play with the provided example.
Experimenting with different hyper-parameter values or injecting additional system noise
through the intervention analysis will enable the user to get a good understanding of the
functioning of the Kalman ¬lter. Comparing the estimates from the various models (i.e.
unweighted rolling regression, weighted regression and stochastic parameter regression)
will give the reader a good understanding of the strengths and weaknesses of the various

This chapter has described three methods for estimating time-varying factor models. The
third approach, based on Kalman ¬ltering, is clearly superior to rolling regressions or
time-weighted regressions. As the reader would probably have realised when using the
software provided, the computational complexity of the Kalman ¬lter can easily be handed
Time-Varying Factor Sensitivities 237

over to a computer program. As a result, the stochastic parameter regression model can
be used as easily as the less accurate rolling regression approach.
The advantage of adaptive models such as the stochastic parameter regression model
resides in its capability to estimate the dynamics of the underlying factor sensitivities. This
enables one to predict ahead risk exposures rather than merely measure their past average.
Given the many applications of factor modelling and time series regression analysis in
equity management, stochastic parameter regression models should prove a useful addition
to the asset manager™s toolbox.

Bollerslev, T. (1986), “Generalised Autoregressive Conditional Heteroskedasticity”, Journal of
Econometrics, 31, 307“327.
Gouri´ roux, C., A. Monfort and G. Gallo (1997), Time Series and Dynamic Models, Cambridge
University Press, Cambridge.
Greene, W. H. (1977), Econometric Analysis, 3rd edition, Prentice Hall, Englewood Cliffs, NJ.
Harvey, A. C. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge
University Press, Cambridge.
Kalman, R. E. (1960), “A New Approach to Linear Filtering and Prediction Problems”, Journal of
Basic Engineering, Transactions ASMA, Series D, 82 (1), 35“45.
Pindyck, R. S. and D. L. Rubinfeld (1998), Econometric Models and Economic Forecasts, 4th edi-
tion, McGraw-Hill, New York.
Rosenberg, B. (1973), “Random Coef¬cients Models: The Analysis of a Cross Section of Time
Series by Stochastically Convergent Parameter Regression”, Annals of Social and Economic
Measurement, 2 (4), 399“428.
Schaefer, S., R. Brealey, S. Hodges and H. Thomas (1975), “Alternative Models of Systematic
Risk”, in E. Elton and M. Gruber (eds), International Capital Markets: An Inter and Intra Country
Analysis, North-Holland, Amsterdam, pp. 150“161.
Stochastic Volatility Models: A Survey with
Applications to Option Pricing and Value at


This chapter presents an introduction to the current literature on stochastic volatility
models. For these models the volatility depends on some unobserved components or a
latent structure.
Given the time-varying volatility exhibited by most ¬nancial data, in the last two
decades there has been a growing interest in time series models of changing variance and
the literature on stochastic volatility models has expanded greatly. Clearly, this chapter
cannot be exhaustive, however we discuss some of the most important ideas, focusing on
the simplest forms of the techniques and models used in the literature.
The chapter is organised as follows. Section 8.1 considers some motivations for stochas-
tic volatility models: empirical stylised facts, pricing of contingent assets and risk evalu-
ation. While Section 8.2 presents models of changing volatility, Section 8.3 focuses on
stochastic volatility models and distinguishes between models with continuous and dis-
crete volatility, the latter depending on a hidden Markov chain. Section 8.4 is devoted
to the estimation problem which is still an open question, then a wide range of possi-
bility is given. Sections 8.5 and 8.6 introduce some extensions and multivariate models.
Finally, in Section 8.7 an estimation program is presented and some possible applications
to option pricing and risk evaluation are discussed.
Readers interested in the practical utilisation of stochastic volatility models and in the
applications can skip Section 8.4.3 without hindering comprehension.

In the last two decades there has been a growing interest in time series models of chang-
ing variance, given the time-varying volatility exhibited by most ¬nancial data. In fact,
the empirical distributions of ¬nancial time series differ substantially from distributions
obtained from sampling independent homoskedastic Gaussian variables. Unconditional
density functions exhibit leptokurtosis and skewness; time series of ¬nancial returns show

We gratefully acknowledge help from Michele Gobbo, Massimiliano Caporin and Andrea Giacomelli.

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m
™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5
240 Applied Quantitative Methods for Trading and Investment

evidence of volatility clustering; and squared returns exhibit pronounced serial correlation
whereas little or no serial dependence can be detected in the return process itself.
These empirical regularities suggest that the behaviour of ¬nancial time series may
be captured by a model which recognises the time-varying nature of return volatility,
as follows:
yt = µt + σt µt µt ∼ IID(0, 1), t = 1, 2, . . . , T

where yt denotes the return on an asset. A common way of modelling σt is to express it
as a deterministic function of the squares of lagged residuals. Econometric speci¬cations
of this form are known as ARCH models and have achieved widespread popularity in
applied empirical research (see Bollerslev et al. (1992, 1993), Bera and Higgins (1993)).
Alternatively, volatility may be modelled as an unobserved component following some
latent stochastic process, such as an autoregression. The resulting models are called
stochastic volatility (SV) models and have been the focus of considerable attention
in recent years (Taylor, 1994; Ghysels et al., 1996; Shephard, 1996). These models
present two main advantages over ARCH models. The ¬rst one is their solid theoretical
background, as they can be interpreted as discretised versions of stochastic volatility
continuous-time models put forward by modern ¬nance theory (see Hull and White
(1987)). The second is their ability to generalise from univariate to multivariate series
in a more natural way, as far as their estimation and interpretation are concerned. On
the other hand, SV models are more dif¬cult to estimate than ARCH models, due to
the fact that it is not easy to derive their exact likelihood function. For this reason, a
number of econometric methods have been proposed to solve the problem of estimation
of SV models.
The literature on SV models has expanded greatly in the last 10 years, reaching consid-
erable proportions; this chapter cannot therefore be exhaustive. We prefer to discuss some
of the most important ideas, focusing on the simplest forms of the techniques and models
used in the literature, referring the reader elsewhere for generalisations and technicalities.
In the organisation of the structure of the present chapter, we have been inspired by the
paper of Shephard (1996), who gave a very interesting survey on SV models updated
to 1995.
To start, we will consider some motivations for stochastic volatility models: empirical
stylised facts, pricing of contingent assets and risk evaluation.

8.1.1 Empirical stylised facts

To illustrate the models and to develop the examples we will work with three European
stock indexes: the FTSE100, the CAC40 and the MIB30, which are market indexes for
the London, Paris and Milan equity markets. These series run from 4 January 1999 to
12 August 2002, yielding 899 daily observations.
Throughout we will work with the compounded return1 on the series

yt = log(xt /xt’1 )

An advantage of using a return series is that it helps in making the time series stationary, a useful statistical
property (see footnote 4).
Stochastic Volatility Models 241

0.10 0.10

0.05 0.05

0.00 0.00

’0.05 ’0.05

0 250 500 750
0 250 500 750 0 250 500 750

40 40 40
Norm. Apx Norm. Apx Norm. Apx

20 20 20

’0.10 ’0.06 ’0.02 0.02 0.06 0.10 ’0.09 ’0.05 ’0.01 0.03 0.07
’0.09 ’0.05 ’0.01 0.03 0.07

1.0 1.0 1.0

0.5 0.5 0.5

0 10 20 0 10 20 0 10 20

Figure 8.1 Summaries of the daily returns on three European stock indexes: the FTSE100, the
CAC40 and the MIB30. Summaries are: time series of returns, non-parametric density estimate
and normal approximation, correlogram of squared returns

where xt is the value of the stock index. Figure 8.1 displays some summaries of these three
series. The raw time series of yt suggests that there are periods of volatility clustering:
days of large movements are followed by days with the same characteristics. This is
con¬rmed by the use of a correlogram on yt2 , also reported in Figure 8.1, which shows
signi¬cant correlations existing at quite extended lag lengths. This suggests that yt2 may
follow a process close to an ARMA(1,1), for a simple AR process cannot easily combine
the persistence in shocks with the low correlation. A correlogram of yt shows little activity
and so is not given in this ¬gure.
Figure 8.1 also gives a density estimate of the unconditional distribution of yt together
with the corresponding normal approximation.2 This suggests that yt is leptokurtic. This
is con¬rmed by Table 8.1, which reports an estimate of the excess of kurtosis with respect
to the normal distribution, which is signi¬cantly positive.
Table 8.1 also reports the Jarque“Bera test for normality and the asymmetry coef¬cients
evidencing that the distributions are negatively skewed, partially due to the period of
analysis, and for all three the null hypothesis of normality is clearly rejected.
These stylised facts can be summarised as follows: non-signi¬cant serial correlation in
the levels of returns; volatility clustering, which implies a signi¬cant and positive serial
correlation in the squares yt2 ; heavy tails and persistence of volatility.

The graphs are produced with the Ox software using some of its basic commands and default options. See
also Section 8.7.1.
242 Applied Quantitative Methods for Trading and Investment
Table 8.1 Summary statistics for the daily returns in Figure 8.1. In parentheses: the p-value of
the Jarque“Bera test


’0.03634 ’0.000219 ’0.04313
Standard deviation 0.001463 0.015630 0.001689
’0.2574 ’0.223998 ’0.23383
Excess of kurtosis 1.547015 4.624507 2.268374
Jarque“Bera test 97 (0.00) 106 (0.00) 197 (0.00)

Finally, there is some evidence that stock markets share periods of high volatility. This
suggests that multivariate models will be important.

8.1.2 Pricing contingent assets
Consider an asset C, with expiring date t + „ , which is a function of a generic under-
lying security S. Assume now that S can be described by the following geometric
diffusion process:
dS = µSdt + σ Sdz

so that d log S = (µ ’ σ 2 /2) dt + σ dz. Economists term such an asset C as “contingent”
or “derivative”. A primary example of a derivative is an option, which gives the owner
the ability but not the obligation to trade the underlying security at a given price K, called
the strike price, in the future. European call options are the most known: the owner can
buy the underlying asset at the strike price K only when the call expires, i.e. at date
t + „ . Its value at the expiration date will be:

Ct + „ = max(St+„ ’ K, 0)

Its purchase value at time t is as yet unknown, but can be determined in different ways.
One of these consists of calculating the discounted expected value of the option at time
t + „.
exp(’r„ )ESt+„ |St [max(St+„ ’ K, 0)]

where r is the risk-free interest rate. However, this completely ignores the fact that this is
a risky asset and traders expect higher returns than on riskless assets. This is the reason
why the discounted expected value is not considered by the market as a correct method
to evaluate an asset. To avoid this inconvenience it is opportune to introduce a utility
function into the pricing of options, letting the dealers choose the risk“expected gain
combination they prefer.
It turns out that the added complexity of a utility function can be avoided by assuming
continuous and costless trading. This statement can be shown by creating a particular
portfolio, which by construction is made up of owning θ of the underlying shares and
by borrowing a single contingent asset C. If the investor properly selects θ at each
time, the stochastic component of the process disappears and ensures the portfolio has
a riskless dynamic, making its return a deterministic function of time (see Black and
Stochastic Volatility Models 243

Scholes (1973)). As time passes, the portfolio will have to be continually adjusted to
maintain risklessness, hence the need for continuous costless trading.
The return of this portfolio must be equal to the riskless interest rate r because the port-
folio itself is risk-free, otherwise traders will have an arbitrage opportunity. This condition
is necessary to obtain the stochastic differential equation followed by the contingent asset:

1 ‚ 2C 2 2
‚C ‚C
+ σ S +r S = rC with end condition C = max(S ’ K, 0)
‚t 2 ‚S‚S ‚S

This equation is quite easy to solve and does not depend on the mean parameter µ nor on
the risk preferences of the traders. Whatever the risk preferences may be, the evaluation of
the option does not change. When solving the equation, risk-neutral preferences are used
to simplify calculations. With instantaneous variance σ 2 , the following Black“Scholes
valuation formula is obtained:

√ log(St /K) + (r + σ 2 /2)„
CtBS (σ 2 ) = St (d) ’ Ke’r„ (d ’ σ „ ) d= √

Note that σ 2 is the only unknown parameter: St and r are observed, while „ and K are
usually given by institutional norms. The price depends strictly on σ 2 which is more
important than the drift, as is often the case in ¬nance, so that the price of the option can
be considered an indicator of the volatility of the underlying asset.
Empirically, the Black“Scholes formula can be used in two ways: either by estimating
σ (the historical volatility) and then calculating the option price or by using real prices
to determine a value for σ 2 (called implied volatility).
This type of analysis has a considerable shortcoming: the basic assumption that stock
returns follow a geometric diffusion process is a poor one, as indicated in Figure 8.1, and
can affect the valuation formula, reducing the precision of the option pricing. This real-
isation has prompted theoretical work into option pricing theory under various changing
volatility regimes. The leading paper in this ¬eld is Hull and White (1987), to which we
will return later.

8.1.3 Risk evaluation

VaR (Value at Risk) is the maximum amount that is expected to be lost over some target
period, i.e. the maximum likely loss. It is a statistical risk measure and represents a
percentile of the probability distribution of the variable of interest.
Generally speaking, VaR can be analytically de¬ned as follows. Let xt be a random
variable of interest measure and Fx (xt ) its cumulative distribution function:

a = Prob(xt ¤ x t ) = Fx (x t ) = f (xt ) dxt

VaR is the percentile de¬ned by the relation:

VaRx (1 ’ a) = xt— = Fxt (a)
244 Applied Quantitative Methods for Trading and Investment
where (1 ’ a) is the VaR con¬dence level, for instance 95% or 99%, and Fxt (a) is the
inverse of the cumulative distribution function.
Given its generality, the VaR method can be applied for different types of risk
measurement, such as market risk, credit risk, operational risk and commodity risk (see
Alexander (1996)). Moreover, for its versatility, VaR allows us to obtain an intuitive
risk measure, to de¬ne homogeneous risk measures that permit a comparison among
different ¬nancial instruments, to determine limiting positions and to construct risk-
adjusted pro¬tability measures.
Let us concentrate on its application to market risk. Market risk means the possibility
that an unexpected variation of market factors (interest rates, exchange rates, stock prices,
etc.) causes an increase or a reduction in the value of a position or in the value of a ¬nancial
portfolio. VaR, in this context, is the maximum expected loss of a marketable ¬nancial
instruments portfolio which could be experienced, for a speci¬ed time horizon period and
a speci¬ed con¬dence level.
We now consider a general portfolio model which allows us to set all the hypotheses
discriminating a risk measurement model like VaR in a systematic manner, by paying
particular attention to the role of the volatility.
Let x„ be a random variable which represents the value of a portfolio in a future period
„ . It is de¬ned by the following relation:
x„ = wi,t Pi,„

where the random variables Pi,„ represent the future value of the N assets in the portfolio
and wi,t are their weights at time t. If we suppose that the N assets will be subjected to
K risk market factors χj,„ , the future value of the portfolio can be expressed as a function
of the K stochastic risk factors by the following pricing formula:
x„ = wi,t Pi,„ (χ1,„ , . . . , χK,„ )

The hypothesis characterising the model therefore concerns: the endogenous variable
choice; the pricing formula; the risk factors de¬nition and their distributions; the risk
factors volatility; the risk factors mapping; the con¬dence level and the choice of the
time horizon.
In the literature, the following approaches are suggested to estimate the VaR (Best,
1998): parametric methods; historical simulation; Monte Carlo simulation; stress testing.
Concerning the parametric methods and the Monte Carlo simulation, it is crucial to prop-
erly describe the volatility dynamics of the risk factors to obtain correct estimates of the
VaR (see, for example, Lehar et al. (2002)).

Following Cox (1981) and Shephard (1996) models of changing volatility can be use-
fully partitioned into observation-driven and parameter-driven models. They both can be
generally expressed using the following parametric framework:

yt |zt ∼ N (µt , σt2 )
Stochastic Volatility Models 245

where µt is often set equal to zero (as we do not intend to focus on that feature of
the model).
In the ¬rst class, i.e. in observation-driven models, zt is a function of lagged values of
yt . The autoregressive conditional heteroskedasticity (ARCH) models introduced by Engle
(1982) are the most representative example of observation-driven models. They describe
the variance as a linear function of the squares of past observations:

σt2 = ±0 + ±1 yt’1 + · · · + ±p yt’p
2 2

and so the model is de¬ned by the conditional density (one-step-ahead forecast density):

yt |Y t’1 ∼ N (0, σt2 )

where Y t’1 is the set of observations up to time t ’ 1. This allows today™s variance to
depend on the variability of recent observations and then one type of shock alone drives
both the series itself and its volatility.
The use of models described by their one-step-ahead forecast offers remarkable advan-
tages that are worth being highlighted. First, the likelihood expression can be simply
obtained by combining these densities, making the estimation and testing easy to handle,
at least in principle. Second, conditional densities imply the use of conditional moments
which are used widely to specify ¬nance theory, although this is conditional on economic
agents™, if not econometricians™, information. Finally, the observation-driven models par-
allel the autoregressive and moving average ones which are commonly used for models
of changing means.
In the second class, i.e. in parameter-driven (or parameter dynamic latent variable or
state space) models, zt is a function of an unobserved or latent component. The log-
normal stochastic volatility model created by Taylor (1986) is the simplest and best-
known example:

yt |ht ∼ N (0, exp(ht )) ht = ± + βht’1 + ·t , ·t ∼ NID(0, σ· )

where ht represents the log-volatility, which is unobserved but can be estimated using the
observations. With respect to the previous class, these models are driven by two types of
shock, one of which in¬‚uences the volatility (i.e. conditional variance equations). These
models parallel the Gaussian state space models of means dealt with by Kalman (1960).
In spite of this, a shortcoming of parameter-driven volatility models is that they gen-
erally lack analytic one-step-ahead forecast densities y t |Y t’1 , unlike the models of the
mean which ¬t into the Gaussian state space form. Hence either an approximation or a
numerically intensive method is required to deal with these models.
Although SV models are harder to handle statistically than the corresponding
observation-driven models, there are still some good reasons for investigating them. We
will see that their properties are easier to ¬nd, understand, manipulate and generalise to
the multivariate case. They also have simpler analogous continuous time representations,
which is important given that much modern ¬nance employs diffusions. An example of
this is the work by Hull and White (1987) which uses a log-normal SV model, replacing
the discrete time AR(1) for ht with an Ornstein“Uhlenbeck process.
246 Applied Quantitative Methods for Trading and Investment

For these models the volatility depends on some unobserved components or a latent struc-
ture. One interpretation for the latent ht is to represent the random and uneven ¬‚ow of new
information, which is very dif¬cult to model directly, into ¬nancial markets (Clark, 1973).
The most popular of these parameter-driven stochastic volatility models, from Taylor
(1986), puts
yt = µt exp(ht /2)
ht = ± + βht’1 + ·t

where µt and ·t are two independent Gaussian white noises, with variances 1 and σ· ,
respectively. Due to the Gaussianity of ·t , this model is called a log-normal SV model.
Another possible interpretation for ht is to characterise the regime in which ¬nancial
markets are operating and then it could be described by a discrete valued variable. The
most popular approach to modelling changes in regime is the class of Markov switching
models introduced by Hamilton (1989) in the econometrics literature. In that case the
simplest model is:3
yt = µt exp(ht /2)
ht = ± + βst

where st is a two-state ¬rst-order Markov chain which can take values 0, 1 and is inde-
pendent of µt . The value of the time series st , for all t, depends only on the last value
st’1 , i.e. for i, j = 0, 1:

P (st = j |st’1 = i, st’2 = i, . . .) = P (st = j |st’1 = i) = pij

The probabilities (pij )i,j =0,1 are called transition probabilities of moving from one state
to the other. Obviously, we get that:

p00 + p01 = p10 + p11 = 1

and these transition probabilities are collected in the transition matrix P:

1 ’ p11
1 ’ p00 p11

which fully describes the Markov chain.
A two-state Markov chain can easily be represented by a simple AR(1) process as
st = (1 ’ p00 ) + (’1 + p00 + p11 )st’1 + vt (8.4)

where vt = st ’ E(st |st’1 , st’2 , . . .). Although vt can take only a ¬nite set of values, on
average vt is zero. The innovation vt is thus a martingale difference sequence. Given the

The representation yt = σst µt , with σ0 = exp(±/2) and σ1 = exp((± + β)/2), is clearly equivalent. To identify

the regime 1 as the high volatility regime, we set β > 0.
Stochastic Volatility Models 247

autoregressive representation of the Markov chain, it is possible to rewrite the volatility
equation of model (8.3) in the following way:

ht = ± + βst
= ± + β[(1 ’ p00 ) + (’1 + p00 + p11 )st’1 + vt ]
= ±(2 ’ p00 ’ p11 ) + β(1 ’ p00 ) + (’1 + p00 + p11 )ht’1 + βvt

The SV model with discrete volatility has therefore the same structure as model (8.2) but
with a noise that can take only a ¬nite set of values:

yt = µt exp(ht /2)
ht = a + bht’1 + ωt

We describe the basic properties of both types of model in the following sections.

8.3.1 SV models with continuous volatility

We consider µt and ·t independent, Gaussian white noises. The properties of model (8.2)
are discussed in Taylor (1986, 1994) (see also Shephard (1996)). Broadly speaking, given
the product process nature of the model, these properties are easy to derive, but estimation
is substantially harder than for the corresponding ARCH models.
As ·t is Gaussian, ht is a standard Gaussian autoregression. It will be stationary
(covariance4 and strictly5 ) if |β| < 1 with:

µh = E(ht ) =
σh = Var(ht ) =
1 ’ β2

As µt is always stationary, yt will be stationary if and only if ht is stationary, yt being the
product of two stationary processes. Using the properties of the log-normal distribution,
all the moments exist if ht is stationary and in particular the kurtosis is:

E(yt4 )
= 3 exp(σh ) ≥ 3
(E(yt ))

which shows that the SV model has fatter tails than the corresponding normal distribution
and all the odd moments are zero.

A stochastic process yt is covariance stationary if the degree of covariance amongst its observations depends
only on the time gap between them, i.e. Cov(yt , yt+r ) = γ (r) for all t.
For some processes there will exist no moments, even in cases where the corresponding uncon-
ditional distributions are perfectly well-behaved. The strict stationarity of yt is then de¬ned as follows:
F (yt+r , yt+r+1 , . . . , yt+r+p ) = F (yt , yt+1 , . . . , yt+p ) for all p and r.
248 Applied Quantitative Methods for Trading and Investment

The dynamic properties of yt are easy to ¬nd. First, as µt is iid, yt is a martingale
difference6 and is a white noise7 if |β| < 1. As ht is a Gaussian AR(1):

Cov(yt2 , yt’r ) = E(yt2 yt’r ) ’ (E(yt2 ))2
2 2

= E(exp(ht + ht’r )) ’ (E(exp(ht )))2
= exp(2µh + σh )(exp(σh β r ) ’ 1)
2 2

and so:
exp(σh β r ) ’ 1 ∼ exp(σh ) ’ 1 r
Cov(yt2 , yt’r )
2 2 2
ρyt2 (r) = = = β
3 exp(σh ) ’ 1 3 exp(σh ) ’ 1
Var(yt2 ) 2 2

Hence, the memory of the yt is de¬ned by the memory of the latent ht , in this case an
AR(1). Moreover, note that if β < 0, ρyt2 (r) can be negative, unlike the ARCH models.
This is the autocorrelation function of an ARMA(1,1) process, thus the SV model behaves
in a manner similar to the GARCH(1,1) model. Finally, note that there is no need for
non-negativity constraints nor for bounded kurtosis constraints on the coef¬cients. This
is a great advantage with respect to GARCH models.
Insights on the dynamic properties of the SV model can also be obtained by squaring
and taking logs, getting:
log(yt2 ) = ht + log(µt2 )
ht = ± + βht’1 + ·t

a linear process, which adds the iid log(µt2 ) to the AR(1) ht . As a result log(yt2 ) ∼
ARMA(1,1). If µt is Gaussian, then log(µt2 ) has a mean of ’1.27 and variance 4.93, but
its distribution is far from being normal, as it is heavily skewed with a long left-hand tail,
caused by taking the logs of very small numbers, an operation which generates outliers.
The autocorrelation function for log(yt2 ) is:
ρlog(yt2 ) (r) =
1 + 4.93/σh

8.3.2 SV models with discrete volatility
We consider a two-state Markov chain st independent of µt , which is a Gaussian white
Assuming stationarity,8 the unconditional probabilities to be in the regime 0(P (st = 0)
= π0 ) or 1 (P (st = 1) = π1 ) are de¬ned as follows:

π0 = p00 π0 + (1 ’ p11 )π1
π1 = (1 ’ p00 )π0 + p11 π1

yt being a martingale difference stipulates that E|yt | < ∞ and that E(yt |yt’1 , yt’2 , . . .) = 0. All martingale

differences have zero means and are uncorrelated over time. If the unconditional variance of the martingale
difference is constant over time, then the series is also a white noise.
This means E(yt ) = µ, Var(yt ) = σ 2 and Cov(yt , yt+r ) = 0 for all r = 0. Often µ will be taken to be zero.

These unconditional moment conditions are sometimes strengthened to include yt being independent, rather
than uncorrelated, over time. This will be called strong white noise, a special case of which is independent and
identically distributed (iid).
An ergodic Markov chain is a covariance stationary process. For some basic properties of Markov chains, see
Hamilton (1994, chap. 22).
Stochastic Volatility Models 249

with π0 + π1 = 1, or in a vector form:

π = Pπ
1π =1

where 1 = (1, 1) . Thus, they are:

1 ’ p11
π0 =
2 ’ p00 ’ p11
1 ’ p00
π1 =
2 ’ p00 ’ p11

From the de¬nition of b in equation (8.5), we can note that when p00 + p11 > 1 the ht
process is likely to persist in its current state and it would be positively serially correlated.
Its unconditional moments are:

E(ht ) = ± + βE(st )
= ± + βπ1
Var(ht ) = β 2 π1 (1 ’ π1 )

Under stationarity,9 as for the SV model with continuous volatility, all the moments exist,
all the odd moments are zero and the kurtosis is:

(π0 + exp(2β)π1 )
E(yt4 )
=3 ≥3
(π0 + exp(β)π1 )2
(E(yt2 ))2

Moreover, as µt is iid, yt is a martingale difference and its dynamic properties are described
by the covariances of squares:

Cov(yt2 , yt’r ) = E(yt2 yt’r ) ’ (E(yt2 ))2
2 2

= E(exp(ht + ht’r )) ’ (exp(±)π0 + exp(± + β)π1 )2
= exp(2±)P (st = 0, st’r = 0) + exp(2± + β)P (st = 0, st’r = 1)
+ exp(2± + β)P (st = 1, st’r = 0) + exp(2± + 2β)P (st = 1, st’r = 1)
’ (exp(±)π0 + exp(± + β)π1 )2

where the vector of unconditional joint probabilities P (st , st’r ) can be computed
as follows:

P (st , st’r ) = P (st |st’r )P (st’r )
= Pr π

For this see Francq and Zako¨an (2001) and Francq et al. (2001).
250 Applied Quantitative Methods for Trading and Investment

with: ® (1 ’ p ) + »r (1 ’ p ) (1 ’ p11 ) + »r (1 ’ p11 ) 
11 00
 
2 ’ p00 ’ p11 2 ’ p00 ’ p11
Pr =  
° (1 ’ p ) + »r (1 ’ p ) (1 ’ p ) + » (1 ’ p ) »
00 00 00 11
2 ’ p00 ’ p11 2 ’ p00 ’ p11

and » = ’1 + p00 + p11 .
Finally, it is useful to note that ht is itself a Markov chain which can take the values
± and ± + β with the same transition matrix P.

The dif¬culties in estimating SV models lie in the latent nature of the volatility. Inference
may be dif¬cult, because the distribution of yt |Y t’1 is speci¬ed implicitly rather than
explicitly and the likelihood function appears as a multivariate integral the size of which
is equal to the number of observations multiplied by the size of the latent variables, which
is 1 for the described models.
Like most non-Gaussian parameter-driven models, there are many different ways of
performing estimation: some involve estimating or approximating the likelihood, others
use the method of moments procedures (see Ghysels et al. (1996) and Shephard (1996)).
Let us ¬rst of all clearly state the problem of computing the likelihood function for
the general class of parametric dynamic latent variable or non-linear and/or non-Gaussian
state space models.

8.4.1 A general ¬lter for non-Gaussian parameter-driven models
Both SV models (with continuous and discrete volatility) ¬t in the following framework:

yt = φt (ht , µt ; θ ) measurement equation
ht = •t (ht’1 , ·t ; θ ) transition equation

where µt and ·t are independent white noises, with marginal distributions which may
depend on θ , the vector of parameters. Let H t and Y t denote (h1 , h2 , . . . , ht ) and
(y1 , y2 , . . . , yt ) , respectively.
There are serious dif¬culties in computing the likelihood function; in fact, with T the
number of observations, we have:
f (Y , H ; θ) = f (yt |Y t’1 , H t ; θ )f (ht |Y t’1 , H t’1 ; θ)


and the likelihood function is:
≡ f (Y ; θ ) = f (yt |Y ,H ; θ )f (ht |Y
T t’1 t t’1 t’1
T (θ ) ,H ; θ) dht (8.8)
t=1 t=1

which is an integral whose size is equal to the number of observations multiplied by the
dimension of the unobserved variable ht , and thus it is practically unfeasible.
Stochastic Volatility Models 251

It is however possible to derive a general algorithm which allows the formal compu-
tation of the likelihood function by decomposing the calculation of integral (8.8) into a
sequence of integrals of lower dimension.
Let f (ht’1 |Y t’1 ) be the input of the iteration,10 t = 1, 2, . . . , T . First of all, we can
decompose the joint conditional density of ht , ht’1 into the product of the transition
density by the input density:

f (ht , ht’1 |Y t’1 ) = f (ht |ht’1 )f (ht’1 |Y t’1 )
step 1

By marginalisation we obtain the prediction density of ht :

f (ht |Y t’1 ) = f (ht , ht’1 |Y t’1 ) dht’1 = f (ht |ht’1 )f (ht’1 |Y t’1 ) dht’1
step 2

Let us now consider the joint density of yt , ht . It can be decomposed into the product of
the measurement density and the prediction density:

f (yt , ht |Y t’1 ) = f (yt |ht )f (ht |Y t’1 )
step 3

and, again, by marginalisation we obtain the one-step-ahead forecast density of yt :

f (yt |Y t’1 ) = f (yt , ht |Y t’1 ) dht = f (yt |ht )f (ht |Y t’1 ) dht
step 4

which is particularly useful, since by the combination of these densities it is possible
to obtain the likelihood function. Finally, by conditioning we obtain the ¬ltering den-
sity (output):

f (yt , ht |Y t’1 ) f (yt |ht )f (ht |Y t’1 )
f (ht |Y ) = =
step 5
f (yt |Y t’1 ) f (yt |ht )f (ht |Y t’1 ) dht

which ends the iteration.
The previous algorithm allows us to obtain several important elements. Step 2 gives
the estimation of ht given all the information available until t ’ 1 (prediction density).
Step 5 provides the estimation of ht given all the information currently available (¬ltering
density). Finally step 4, by providing the one-step-ahead forecast density, allows us to
compute the likelihood function.
Unfortunately, only in very special cases is it possible to obtain analytic recursive
algorithms11 from this general ¬ltering algorithm: the Kalman ¬lter in the Gaussian and
linear case and the Hamilton ¬lter in the Markovian and discrete case.
In the Gaussian and linear cases, the initial input f (h1 |Y 0 ) and the measurement and
transition densities are assumed to be Gaussian and at each step of the algorithm Gaus-
sianity is preserved, then also all the outputs are Gaussian. The normal distribution is

For the ¬rst iteration (t = 1) it is possible to consider the unconditional distribution of ht , f (h1 ). For the

sake of simplicity we omit the dependence on the parameter θ .
See also Shephard (1994) for another particular case in which ht is set to be a random walk and exp(·t ) a
highly contrived scaled beta distribution. This delivers a one-step-ahead prediction distribution which has some
similarities to the ARCH models.
252 Applied Quantitative Methods for Trading and Investment

completely described by its ¬rst two moments and then the algorithm can be rewritten by
relating means and variances of the different densities involved. This is the Kalman ¬lter.
For the switching regime models introduced by Hamilton (1989), which represent the
Markovian and discrete case, the integrals which appear at steps 2 and 4 become a simple
sum over the possible regimes, and then the whole algorithm is analytically tractable.
In all the other cases, it is necessary to consider approximated solutions or simulation-
based methods. Examples of approximations are the extended Kalman ¬lter (Anderson
and Moore, 1979; Harvey, 1989; Fridman and Harris, 1998), the Gaussian sum
¬lter (Sorenson and Alspach, 1971), the numerical integration (Kitagawa, 1987), the
Monte Carlo integration (Tanizaki and Mariano, 1994, 1998), or the particle ¬lter (Gordon
et al., 1993); Kitagawa, 1996; Pitt and Shephard, 1999a). The simulation-based solutions
are certainly more time-consuming and demanding in terms of computing, but they are
de¬nitely more general. We will see these methods in greater detail later.
However, for the two presented models (8.2) and (8.3), the general ¬lter introduced here
is useful for estimation. In fact, for the linearised version (8.6), the Kalman ¬lter allows
a quasi-maximum likelihood estimation of the parameters and the discrete version (8.3)
is a particular case of switching regime models for which the Hamilton ¬lter gives the
likelihood function. The Kalman ¬lter for quasi-maximum likelihood (QML) estimation
of continuous SV models

We can consider the log-transformation (8.6) of the continuous SV model. As log(µt2 ) ∼
iid, we obtain a linear state space model.
Let LY „ = (log(y1 ), log(y2 ), . . . , log(y„ )) , ht/„ = E(ht |LY „ ) = E(ht |Y „ ) and Qt/„ =
2 2 2

MSE(ht |LY „ ) = MSE(ht |Y „ ). The Kalman ¬lter (see, for example,12 Harvey (1989))
computes these quantities recursively for t = 1, . . . , T :

ˆ ˆ
ht/t’1 = ± + β ht’1/t’1
Qt/t’1 = β 2 Qt’1/t’1 + σ·

et/t’1 = log(yt2 ) ’ ht/t’1
Ft/t’1 = Qt/t’1 + π 2 /2

ˆ ˆ
ht/t = ht/t’1 + Kt et/t’1
Qt/t = (1 ’ Kt )2 Qt/t’1

where Kt = Qt/t’1 Ft/t’1 is the Kalman gain. However, as log(µt2 ) is not Gaussian, the
Kalman ¬lter can be used to provide the best linear unbiased estimator of ht given Y t .
Moreover, if (8.6) were a Gaussian state space model, the Kalman ¬lter would provide
the exact likelihood function. In fact, a by-product of the ¬lter are the innovations et/t’1 ,

See also Carraro and Sartore (1987).
Stochastic Volatility Models 253

which are the one-step-ahead forecast errors and their corresponding mean square errors,
Ft/t’1 . Together they deliver the likelihood (ignoring constants):

T T 2
1 1
T (θ ) = ’ log Ft/t’1 ’
2 2
t=1 t=1

As the state space is linear but not Gaussian, the ¬lter gives a quasi-likelihood func-
tion which can be used to obtain a consistent estimator θ and asymptotically normal
inference (see Ruiz (1994)).
This way of estimating ht is used by Melino and Turnbull (1990), after estimating θ by
the generalised method of moments (see Section Harvey et al. (1994) examine
the QML estimator. The Hamilton ¬lter for maximum likelihood estimation of discrete SV models

The discrete SV model (8.3) is a non-linear and non-Gaussian state space model. In
the two-regimes case, the transition equation can be written in a linear form (see (8.5))
and the measurement equation can be linearised by the log transformation, but both
the equations are non-Gaussian. However, the joint process (yt , ht ) is Markovian and
thus the general ¬lter presented in Section 8.4.1 gives an analytic recursion, since the
integrals become simple sums over the possible values of ht . The input is the ¬ltered
probability13 P (ht’1 |Y t’1 ) and the algorithm gives the prediction probability, the one-
step-ahead forecast density and the subsequent ¬ltered probability:

P (ht |Y t’1 ) = P (ht |ht’1 )P (ht’1 |Y t’1 )

f (yt |Y t’1 ) = f (yt |ht )P (ht |Y t’1 )

f (yt |ht )P (ht |Y t’1 )
P (ht |Y ) =
f (yt |ht )P (ht |Y t’1 )

The combination of the one-step-ahead forecast densities:

= f (yt |Y t’1 )
T (θ )

provides the likelihood function, the maximisation of which gives the maximum likelihood
estimators of the parameters.

The initial probability P (h0 |Y 0 ) can be taken equal to the unconditional (ergodic) probability P (h0 ) = π .
254 Applied Quantitative Methods for Trading and Investment

8.4.2 A general smoother for non-Gaussian parameter-driven models
We might also want to obtain the estimation of ht given all the information available, that
is conditional on Y T . Such a procedure is called smoothing and as before it is possible
to derive a formal backward algorithm which delivers the smoothed densities f (ht |Y T ).
Let f (ht+1 |Y T ) be the input of the iteration,14 t = T ’ 1, T ’ 2, . . . , 2, 1. We can
decompose the joint density of ht+1 , ht , conditional on the information set Y t , in the
product of the transition density by the ¬ltered density (available from the ¬lter):

f (ht+1 , ht |Y t ) = f (ht+1 |ht )f (ht |Y t )
step 1

By conditioning with the prediction density obtained from the ¬lter, we obtain the fol-
lowing conditional density:

f (ht+1 , ht |Y t )
f (ht |ht+1 , Y t ) =
step 2
f (ht+1 |Y t )

The joint density of ht+1 , ht , conditional on the information set Y T , is given by the product
of the conditional density f (ht |ht+1 , Y T ) by the input of the algorithm f (ht+1 |Y T ). The
information set ht+1 , Y T is included in the information set ht+1 , Y t , µt+1 , ·t+2 , where
µt+1 = (µt+1 , . . . , µT ) and ·t+2 = (·t+2 , . . . , ·T ) . Given that µt+1 , ·t+2 is independent of

ht , ht+1 , Y t , we can conclude that f (ht |ht+1 , Y T ) = f (ht |ht+1 , Y t ) (computed at step 2)
and then:

f (ht+1 , ht |Y T ) = f (ht |ht+1 , Y T )f (ht+1 |Y T ) = f (ht |ht+1 , Y t )f (ht+1 |Y T )
step 3

Finally, by marginalisation we obtain the smoothed density of ht (output):

f (ht |Y T ) = f (ht+1 , ht |Y T ) dht+1 = f (ht |ht+1 , Y t )f (ht+1 |Y T ) dht+1
step 4

Again, only in the linear and Gaussian case, and in the Markovian and discrete case is
it possible to obtain an analytic backward recursion: the Kalman smoother and the Kim
smoother (Kim, 1994). The Kalman smoother for continuous SV models
Let ht+1/T = E(ht+1 |LY T ) = E(ht+1 |Y T ) and Qt+1/T = MSE(ht+1 |LY T ) = MSE
(ht+1 |Y T ). The Kalman smoother15 computes these quantities recursively for t =
T ’ 1, T ’ 2, . . . , 2, 1:
ˆ ˆ ˆ ˆ
ht/T = ht/t + βQt/t Q’1 (ht+1/T ’ ht+1/t )

Qt/T = Qt/t + β 2 Q2 Q’2 (Qt+1/T ’ Qt+1/t )
t/t t+1/t

ˆ ˆ
where ht/t , Qt/t , ht+1/t , Qt+1/t are stored from the Kalman ¬lter.

For the ¬rst iteration (t = T ’ 1), the input is simply the ¬nal output of the ¬lter f (hT |Y T ).
See also de Jong (1989).
Stochastic Volatility Models 255

For the log transformation of the continuous SV model (8.6), the Kalman smoother
is useful in estimating the unobserved log-volatility, in fact it provides the best linear
unbiased estimator of ht given (y1 , y2 , . . . , yT ) . The Kim smoother for discrete SV models
The input is the smoothed probability P (ht+1 |Y T ) and the recursion is simply:

P (ht+1 |ht )P (ht |Y t )P (ht+1 |Y T )
P (ht |Y ) =
P (ht+1 |Y t )

where P (ht |Y t ) and P (ht+1 |Y t ) are stored from the Hamilton ¬lter.

8.4.3 Other estimation methods for continuous SV models
For the discrete SV model the Hamilton ¬lter allows us to obtain the maximum likelihood
estimator of the parameters. On the contrary, for the continuous SV models, the Kalman
¬lter provides only an approximation of the likelihood function. Let us review some other
possible estimation methods useful for the continuous SV model.
Like most non-Gaussian parameter-driven models, there are many different ways
to perform estimation. Some involve estimating the likelihood; others use method of
moments procedures. Method of moments
The simplest approach is the method of moments, based on matching empirical and
theoretical moments. In the SV case there are many possible moments to use in estimating
the parameters of the model. This is because yt2 behaves like an ARMA(1,1) model and
moving average models do not allow suf¬cient statistics which are of a smaller dimension
than T . This suggests that the use of a ¬nite number of moment restrictions is likely to
lead to loss of information. Examples include those based on yt2 , yt4 , yt2 yt’r , although there

are many other possibilities. As a result, we may well want to use more moments than
there are parameters to estimate, implying that they will have to be pooled. A reasonably
sensible way of doing this is via the Generalised Method of Moments (GMM).
We can consider, for example, the vector gT of the ¬rst r autocovariances of yt2 or of
log(yt2 ) as moment constraints. There are more moments than parameters and the issue
is how to weight all the available information. The GMM approach of Hansen (1992)
suggests minimising the quadratic form gT WT gT by varying the parameters θ and the
weighting matrix WT should re¬‚ect the relative importance given to matching each of
the chosen moments. Applications of this method to SV models are the seminal work of
Melino and Turnbull (1990) and the extensive study of Andersen and Sørensen (1996).
The main advantage of the GMM approach comes from the fact that it does not require
distributional assumptions. However, this is not useful for the SV model since it is a fully
speci¬ed parametric model. On the contrary, as argued by Shephard (1996), there are a
number of drawbacks to the GMM estimation of the SV model. First of all, GMM can only
be used if ht is stationary; if β is close to one (as we will ¬nd for many high-frequency
¬nancial data sets), we can expect GMM to work poorly. Second, parameter estimates
256 Applied Quantitative Methods for Trading and Investment

are not invariant to the parameterisation and the model (8.2) is not fundamentally more
interesting than
yt = µt γ exp(ht /2)
ht = βht’1 + ·t

Third, as already observed, the squares yt2 behave like an ARMA(1,1) model; if σ· is 2

small (as we will ¬nd in practice), ρyt2 (r) will be small but positive for many r. This
implies that for many series the number of moments to be considered will have to be
very high to capture the low correlation in the volatility process. Finally, GMM does not
deliver an estimate (¬ltered or smoothed) of ht , consequently a second form of estimation
will be required.
The GMM and QML approaches are the simplest way of estimating the SV models
and they are about equally ef¬cient, with the relative performance being dependent on
the speci¬c parameter values (see Andersen and Sørensen (1997)). Simulation-based methods
All the other estimation approaches are based on simulation techniques. In the last 10
years there has been a growing interest in simulation-based16 methods which propose
several ways of resolving the inference problem for this class of models (see Billio (1999,
2002b)). In fact, it is clear that one can easily recursively simulate (path simulations) from
the system (8.2) for any given value of parameters, θ .
A ¬rst approach relies on simulation-based methods which are relatively simple to
implement, but which are less ef¬cient than the maximum likelihood approach: see, for
example, the simulated method of moments (Duf¬e and Singleton, 1993), the indirect
inference method (Gouri´ roux et al., 1993) or the ef¬cient method of moments (Gallant
and Tauchen, 1996; Gallant et al., 1997). A second approach considers the problem
of the computation (or of the approximation) of the likelihood and then of the maxi-
mum likelihood estimator through importance sampling methods (Danielsson and Richard,
1993; Danielsson, 1994; Durbin and Koopman, 1997). In a Bayesian framework, a third
approach considers Markov Chain Monte Carlo (MCMC) techniques based on the data
augmentation principle, which yields samples out of the joint posterior distribution of the
latent volatility and all model parameters, and allows the parameter estimates and the latent
volatility dynamics to be obtained (Jacquier et al., 1994; Kim et al., 1998; Chib et al.,
2002). Finally, a fourth approach utilises MCMC methods to compute (or approximate) the
maximum likelihood estimator (see the simulated expectation maximisation (Shephard,
1993; Geyer, 1994, 1996; Billio et al., 1998).
In practice, the choice between these different simulation-based approaches depends on
several criteria, such as ef¬ciency and computing time. Unfortunately, in general there is a
trade-off between these criteria. Methods like the simulated maximum likelihood and the
simulated likelihood ratio have several advantages in the estimation of SV models. Since
they are likelihood methods, the classical theory of maximum likelihood carries over to
the simulated case and standard likelihood ratio tests can be constructed. MCMC-based
approaches are certainly more time-consuming, but also allow estimation of the latent
volatility dynamics by simulating from the smoothing/posterior distribution of ht .

Simulation techniques make use of sequences of pseudo-random numbers which are generated by a computer
Stochastic Volatility Models 257

Let us brie¬‚y introduce part of these methods and their application to SV models. Indirect inference approach

The so-called indirect inference methodology was recently introduced in the literature
by Smith (1993), Gouri´ roux et al. (1993), Gallant and Tauchen (1996) for a simulation-
based inference on generally intractable structural models through an auxiliary model,
conceived as easier to handle. This methodology allows the use of somewhat misspeci¬ed
auxiliary models, since the simulation process in the well-speci¬ed structural model and
the calibration of the simulated paths against the observed one through the same auxiliary
model will provide an automatic misspeci¬cation bias correction. There are several ways
of implementing this idea.17
The original approach is the indirect inference method of Gouri´ roux et al. (1993).
Consider an auxiliary model fa (yt |Y ; π ) for the observed data (for example18 the

general linear state space model obtained by the log transformation (8.6)). Let πT = ˆ
T (Y ) denote the QML estimator of π based on fa as a function T (·) of the observed
data set Y . The indirect inference estimator of structural parameters θ is given by:

θII = arg min[πT ’ πNT (θ )] WT [πT ’ πNT (θ )]
ˆ ˜ ˆ ˜

where WT is a weighting matrix and πNT (θ ) is the π estimator obtained on a simulated
˜ for a given value of θ (i.e. that is given by the binding function πNT (θ ) =
path of Y
˜ NT ), which is approximated by NT (Y NT ) for large N ). This approach may
lim NT (Y
be very computationally demanding as one needs to evaluate the binding function πNT (θ )
for each value of θ appearing in the numerical optimisation algorithm.


. 12
( 19)