can also be set to a multiple of the observation noise variance (option A). In this case,

the user sets the desired signal to noise ratio in the edit box. An alternative way is to let

the system estimate the optimal values of the parameters from the data. The estimation

Signal to noise ratio = 1 Actual beta

Estimated beta

Confidence bands: ± 1 SE

Sensitivity

1 26 51 76 101 126 151 176

Time

Figure 7.9 Actual versus estimated beta. The system noise to observation noise ratio, Q/R, is 1.

This ratio allows the Kalman ¬lter to rapidly adapt to the jump in sensitivity. However, this quick

reaction comes at a cost of increased volatility in the beta estimates and of a very large standard

error

20

Unless the user has a good prior knowledge of the values to be entered in these cells (on the “Example”

sheet, range “F2:F5”), we do not recommend using this option.

Time-Varying Factor Sensitivities 231

Signal to noise ratio = 0.1 Actual beta

Estimated beta

Confidence bands: ± 1 SE

Sensitivity

1 26 51 76 101 126 151 176

Time

Figure 7.10 Actual versus estimated beta. Q/R is now 0.1. The beta estimates are smoother and

their standard errors smaller. However, the model seems to adapt more slowly to the sensitivity

jump

Signal to noise ratio = 0.02 Actual beta

Estimated beta

Confidence bands: ± 1 SE

Sensitivity

1 26 51 76 101 126 151 176

Time

Figure 7.11 Actual versus estimated beta. For Q/R = 0.02, beta estimates are very smooth and

their standard errors are much smaller. This is because variations in co-movements of Xt and Yt

are attributed to ¬‚uke rather than to true variations in beta

procedure is described later in the chapter. Figure 7.13 shows the results of the random

walk model for a signal to noise ratio of 1.

While models 1 and 2 use values for the R and Q variances that are constant over

the data set, model 3 (“Intervention analysis”) enables users to apply their own time-

varying R and Q values. This is achieved by specifying the range containing additive

noise variances (i.e. R and Q time series). This enables the user to force the model to

232 Applied Quantitative Methods for Trading and Investment

Figure 7.12 The stochastic parameter regression menu of the tool pack

Actual Sensitivities Alpha X1 X2

Sensitivity

1.00

0.50

Time

0.00

’0.50

Figure 7.13 Stochastic parameter regression example. The underlying sensitivities are assumed

to follow a random walk (model 1) and the signal to noise ratio is set to 1

rapidly adapt to important events (by increasing Q during this period) or, on the contrary,

to make the model insensitive to irrelevant events (by increasing R during this period).

For instance, if the system noise variance Q is increased for variable X2 at time t = 52

(cell “Q54” is set to a large number (e.g. 50)), the model adapts rapidly to the level shift

in β2 and models almost perfectly the actual factor sensitivities. The result of this model

is displayed in Figure 7.14.21

21

Model 3 should only be used to force the model to take prior knowledge into account. In general columns

N to Q in the “Example” sheet should be set to zero unless the user wants to modulate the adaptation speed

of the model by increasing the credibility attributed to some observations (by putting a positive number for the

additive Q values) or by decreasing this credibility (by putting a positive number for the additive R value).

The values to be entered are discretionary and depend on the user™s perception of the event™s importance. The

bigger the value the more it will alter the model™s adaptation speed for the corresponding period.

Time-Varying Factor Sensitivities 233

Actual Sensitivities Alpha X1 X2

Sensitivity

1.00

0.50

Time

0.00

’0.50

Figure 7.14 Example of the stochastic parameter regression using model 3. The underlying sen-

sitivities are assumed to follow a random walk with system noise added at t = 52 (time of level

shift in b2 ). The signal to noise ratio is still set to 1. The ¬t is almost perfect as the model is

forced to adapt to the level shift

The reader should now be familiar with the functioning of the Kalman ¬lter. This

adaptive procedure optimally computes the current state of the model (i.e. the factor sen-

sitivities and their covariance matrix) given the previous state and some hyper-parameters

such as system and observation noise variances. Two important questions remain unan-

swered: (i) how do we determine the initial state of the model?, (ii) how do we best set

the values for the hyper-parameters?

The initial distribution of the state vector (mean s and covariance P0 ) can be set

in several ways. The simplest is to use prior knowledge. For instance, one can start

with a prior value of 1 for a stock™s beta, and a prior value of 0 for the sensitivity of

the return to exchange rates. A popular alternative is to estimate sensitivities (and their

covariance matrix) from a subsample of the data by OLS. However, this may use too

many observations from the data sample and we have already shown the shortcomings

of OLS estimation in cases where sensitivities are not constant. The model provided in

the tool pack uses a diffuse prior (see Harvey (1989)). A diffuse prior corresponds to an

in¬nite covariance matrix P0 (i.e. P’1 = 0) so that no credibility is given to the initial

0

state estimate, letting the Kalman ¬lter estimate the states from the data. Practically,

however, a large initial covariance matrix is used rather than an in¬nite one. P0 is given

by equation (7.36):

P0 = κI (7.36)

κ being equal to a large but ¬nite number (e.g. 100 000). In the case of a diffuse prior,

the mean of the initial state distribution, s, is irrelevant as long as it is ¬nite. We use a

value of zero for all state variables. The use of a diffuse prior can be viewed as a way to

let the ¬lter estimate the initial values from the ¬rst observations.

Because a diffuse prior is used, little information needs to be supplied by the user. Once

the time series model for the sensitivities and the values for the hyper-parameters , Q,

R, d and c have been chosen, the adaptive model optimally estimates the sensitivities.

The values for these parameters can be set by the user but they can also be estimated

234 Applied Quantitative Methods for Trading and Investment

from the data. The model provided in the tool pack supports an optimisation algorithm

in order to ¬nd the “best” parameters.22

Given the hyper-parameters, the Kalman ¬lter derives recursive values for the factor

sensitivities and their covariance. Hence, the model provides predictions of the dependent

variable and residuals can be computed, conditional on the hyper-parameter values. A

suitable maximisation routine can then be used to maximise the likelihood of observing

the data.

Maximum likelihood estimation consists of ¬nding an estimate of the unknown par-

ameter vector θ such that it maximises the likelihood of generating the data that were

actually observed. In other words, given the sample of observations Yt , ¬nding a solution

for θ that maximises the joint density probability function L(Y, θ ). As observations are

not independent in a time series model, the joint density function is not the usual product

of probability density function corresponding to each observation. Instead, conditional

probability density functions are used to write the joint density function:

T

L(Y, θ ) = p(Yt |Y1 , Y2 , . . . , Yt’1 ) (7.37)

t=1

The disturbances and the initial state vector following a multivariate normal distribution by

assumption, the conditional density functions p(Yt |Y1 , Y2 , . . . , Yt’1 ) are also normal. The

ˆ

means Yt|t’1 and variances ft|t’1 of these normal distributions are given by the Kalman

¬lter equations (7.26) and (7.28). The model having Gaussian residuals et , the logarithm

of the likelihood function L(Y, θ ) “ i.e. the log-likelihood function log L(Y, θ ) “ can be

written in terms of the prediction errors et and their variances ft|t’1 that have been de¬ned

in equations (7.27) and (7.28):

T T

et2

T 1 1

log L(θ |Y1 , Y2 , . . . , YT ) = ’ ln(2π) ’ ln(ft|t’1 ) ’ (7.38)

ft|t’1

2 2 2

t=1 t=1

ˆ

The maximum likelihood estimate θ of the parameter vector θ is the value of θ that

maximises log-likelihood, i.e.:

ˆ

θ = arg max log L(θ |Y1 , Y2 , . . . , YT ) (7.39)

θ

The Kalman ¬lter algorithm and equation (7.38) are used to compute the log-likelihood

for a given value of θ . A numerical optimiser is employed to ¬nd the vector θ of hyper-

parameters that maximises the log-likelihood function.

The log-likelihood function in equation (7.38) is a non-linear function of the hyper-

parameters and, unfortunately, there is no analytical solution θMLE to the equation

‚ log L

=0 (7.40)

‚θ

22

“Best” according to criteria such as prediction mean square error or a likelihood function.

Time-Varying Factor Sensitivities 235

Given values of the hyper-parameters

q = (¦, d, ct , Rt , Qt)

Run the Kalman filter for t = 1,¦, T

Compute the model™s residuals et

Choose new

values of q Compute the log-likelihood function

log (q) conditional on the values of

that increase

the unknown hyper-parameter vector q

log (q)

No

log (q) = maximum?

Yes

Stop

Figure 7.15 The maximum likelihood estimation procedure

A numerical optimisation method for ¬nding the maximum likelihood parameters is there-

fore employed. This iterative procedure is represented in Figure 7.15.

The critical stage of this procedure is the algorithm used to determine the parameter

vector that maximises log-likelihood. There are numerous methods for maximising such

a non-linear function. However, they are unequal in terms of convergence speed and

dependence on initial parameter values. The optimiser used by the tool pack is based on

a quasi-Newton method (see for instance Greene (1977, p. 204)). It usually converges to

a minimum within a few iterations.

By checking the “Estimate parameters” check box, the software optimises the R and

Q parameters and saves them in the range indicated in the text box. The “Estimation

option” button enables the user to select the optimisation criterion (Figure 7.16), mean

squared error or log-likelihood.

Figure 7.16 The optimisation menu for the stochastic parameter regression. System and

observation noise variances can be estimated from the data

236 Applied Quantitative Methods for Trading and Investment

Actual Sensitivities Alpha X1 X2

Sensitivity

1.00

0.50

Time

0.00

’0.50

Figure 7.17 This is the result of the stochastic parameter regression using model 3 with opti-

mised parameters. The underlying sensitivities are assumed to follow a random walk with system

noise added at t = 52 (time of level shift in b2 ). The R and Q values have been computed by

maximum likelihood estimation

Once the hyper-parameters of the model are estimated, the procedure runs the Kalman

¬lter one last time to compute the sensitivities. The R and Q values are saved on the

spreadsheet. The result of the optimised model is displayed in Figure 7.17.

Given the optimised hyper-parameters, the stochastic parameter model can be expressed

as:

Y (t) = ±(t) + β1 (t)X1 (t) + β2 (t)X2 (t) + µ(t)

where ±(t) = ±(t ’ 1) + ·1 (t), Var(·1 ) = 1.11E-04 (cell “F3”)

β1 (t) = β1 (t ’ 1) + ·2 (t), Var(·2 ) = 6.02E-04 (cell “F4”)

β2 (t) = β2 (t ’ 1) + ·3 (t), Var(·3 ) = 8.73E-08 (cell “F5”)

Var(µ) = 1.30E-08 (cell “F2”)

For instance for t = 50, we have:

Y = ’0.19 + 0.11X1 + 0.50X2 + µ (cells J52:L52)

The reader will ¬nd it a useful and rewarding exercise to play with the provided example.

Experimenting with different hyper-parameter values or injecting additional system noise

through the intervention analysis will enable the user to get a good understanding of the

functioning of the Kalman ¬lter. Comparing the estimates from the various models (i.e.

unweighted rolling regression, weighted regression and stochastic parameter regression)

will give the reader a good understanding of the strengths and weaknesses of the various

approaches.

7.6 CONCLUSION

This chapter has described three methods for estimating time-varying factor models. The

third approach, based on Kalman ¬ltering, is clearly superior to rolling regressions or

time-weighted regressions. As the reader would probably have realised when using the

software provided, the computational complexity of the Kalman ¬lter can easily be handed

Time-Varying Factor Sensitivities 237

over to a computer program. As a result, the stochastic parameter regression model can

be used as easily as the less accurate rolling regression approach.

The advantage of adaptive models such as the stochastic parameter regression model

resides in its capability to estimate the dynamics of the underlying factor sensitivities. This

enables one to predict ahead risk exposures rather than merely measure their past average.

Given the many applications of factor modelling and time series regression analysis in

equity management, stochastic parameter regression models should prove a useful addition

to the asset manager™s toolbox.

REFERENCES

Bollerslev, T. (1986), “Generalised Autoregressive Conditional Heteroskedasticity”, Journal of

Econometrics, 31, 307“327.

Gouri´ roux, C., A. Monfort and G. Gallo (1997), Time Series and Dynamic Models, Cambridge

e

University Press, Cambridge.

Greene, W. H. (1977), Econometric Analysis, 3rd edition, Prentice Hall, Englewood Cliffs, NJ.

Harvey, A. C. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge

University Press, Cambridge.

Kalman, R. E. (1960), “A New Approach to Linear Filtering and Prediction Problems”, Journal of

Basic Engineering, Transactions ASMA, Series D, 82 (1), 35“45.

Pindyck, R. S. and D. L. Rubinfeld (1998), Econometric Models and Economic Forecasts, 4th edi-

tion, McGraw-Hill, New York.

Rosenberg, B. (1973), “Random Coef¬cients Models: The Analysis of a Cross Section of Time

Series by Stochastically Convergent Parameter Regression”, Annals of Social and Economic

Measurement, 2 (4), 399“428.

Schaefer, S., R. Brealey, S. Hodges and H. Thomas (1975), “Alternative Models of Systematic

Risk”, in E. Elton and M. Gruber (eds), International Capital Markets: An Inter and Intra Country

Analysis, North-Holland, Amsterdam, pp. 150“161.

8

Stochastic Volatility Models: A Survey with

Applications to Option Pricing and Value at

Risk—

MONICA BILLIO AND DOMENICO SARTORE

ABSTRACT

This chapter presents an introduction to the current literature on stochastic volatility

models. For these models the volatility depends on some unobserved components or a

latent structure.

Given the time-varying volatility exhibited by most ¬nancial data, in the last two

decades there has been a growing interest in time series models of changing variance and

the literature on stochastic volatility models has expanded greatly. Clearly, this chapter

cannot be exhaustive, however we discuss some of the most important ideas, focusing on

the simplest forms of the techniques and models used in the literature.

The chapter is organised as follows. Section 8.1 considers some motivations for stochas-

tic volatility models: empirical stylised facts, pricing of contingent assets and risk evalu-

ation. While Section 8.2 presents models of changing volatility, Section 8.3 focuses on

stochastic volatility models and distinguishes between models with continuous and dis-

crete volatility, the latter depending on a hidden Markov chain. Section 8.4 is devoted

to the estimation problem which is still an open question, then a wide range of possi-

bility is given. Sections 8.5 and 8.6 introduce some extensions and multivariate models.

Finally, in Section 8.7 an estimation program is presented and some possible applications

to option pricing and risk evaluation are discussed.

Readers interested in the practical utilisation of stochastic volatility models and in the

applications can skip Section 8.4.3 without hindering comprehension.

8.1 INTRODUCTION

In the last two decades there has been a growing interest in time series models of chang-

ing variance, given the time-varying volatility exhibited by most ¬nancial data. In fact,

the empirical distributions of ¬nancial time series differ substantially from distributions

obtained from sampling independent homoskedastic Gaussian variables. Unconditional

density functions exhibit leptokurtosis and skewness; time series of ¬nancial returns show

—

We gratefully acknowledge help from Michele Gobbo, Massimiliano Caporin and Andrea Giacomelli.

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m

±

™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5

240 Applied Quantitative Methods for Trading and Investment

evidence of volatility clustering; and squared returns exhibit pronounced serial correlation

whereas little or no serial dependence can be detected in the return process itself.

These empirical regularities suggest that the behaviour of ¬nancial time series may

be captured by a model which recognises the time-varying nature of return volatility,

as follows:

yt = µt + σt µt µt ∼ IID(0, 1), t = 1, 2, . . . , T

where yt denotes the return on an asset. A common way of modelling σt is to express it

as a deterministic function of the squares of lagged residuals. Econometric speci¬cations

of this form are known as ARCH models and have achieved widespread popularity in

applied empirical research (see Bollerslev et al. (1992, 1993), Bera and Higgins (1993)).

Alternatively, volatility may be modelled as an unobserved component following some

latent stochastic process, such as an autoregression. The resulting models are called

stochastic volatility (SV) models and have been the focus of considerable attention

in recent years (Taylor, 1994; Ghysels et al., 1996; Shephard, 1996). These models

present two main advantages over ARCH models. The ¬rst one is their solid theoretical

background, as they can be interpreted as discretised versions of stochastic volatility

continuous-time models put forward by modern ¬nance theory (see Hull and White

(1987)). The second is their ability to generalise from univariate to multivariate series

in a more natural way, as far as their estimation and interpretation are concerned. On

the other hand, SV models are more dif¬cult to estimate than ARCH models, due to

the fact that it is not easy to derive their exact likelihood function. For this reason, a

number of econometric methods have been proposed to solve the problem of estimation

of SV models.

The literature on SV models has expanded greatly in the last 10 years, reaching consid-

erable proportions; this chapter cannot therefore be exhaustive. We prefer to discuss some

of the most important ideas, focusing on the simplest forms of the techniques and models

used in the literature, referring the reader elsewhere for generalisations and technicalities.

In the organisation of the structure of the present chapter, we have been inspired by the

paper of Shephard (1996), who gave a very interesting survey on SV models updated

to 1995.

To start, we will consider some motivations for stochastic volatility models: empirical

stylised facts, pricing of contingent assets and risk evaluation.

8.1.1 Empirical stylised facts

To illustrate the models and to develop the examples we will work with three European

stock indexes: the FTSE100, the CAC40 and the MIB30, which are market indexes for

the London, Paris and Milan equity markets. These series run from 4 January 1999 to

12 August 2002, yielding 899 daily observations.

Throughout we will work with the compounded return1 on the series

yt = log(xt /xt’1 )

1

An advantage of using a return series is that it helps in making the time series stationary, a useful statistical

property (see footnote 4).

Stochastic Volatility Models 241

FTSE100 CAC40 MIB30

0.10

0.10 0.10

0.05

0.05 0.05

0.00

0.00 0.00

’0.05

’0.05 ’0.05

0 250 500 750

0 250 500 750 0 250 500 750

40 40 40

Norm. Apx Norm. Apx Norm. Apx

20 20 20

’0.10 ’0.06 ’0.02 0.02 0.06 0.10 ’0.09 ’0.05 ’0.01 0.03 0.07

’0.09 ’0.05 ’0.01 0.03 0.07

1.0 1.0 1.0

ACF- ACF- ACF-

0.5 0.5 0.5

0 10 20 0 10 20 0 10 20

Figure 8.1 Summaries of the daily returns on three European stock indexes: the FTSE100, the

CAC40 and the MIB30. Summaries are: time series of returns, non-parametric density estimate

and normal approximation, correlogram of squared returns

where xt is the value of the stock index. Figure 8.1 displays some summaries of these three

series. The raw time series of yt suggests that there are periods of volatility clustering:

days of large movements are followed by days with the same characteristics. This is

con¬rmed by the use of a correlogram on yt2 , also reported in Figure 8.1, which shows

signi¬cant correlations existing at quite extended lag lengths. This suggests that yt2 may

follow a process close to an ARMA(1,1), for a simple AR process cannot easily combine

the persistence in shocks with the low correlation. A correlogram of yt shows little activity

and so is not given in this ¬gure.

Figure 8.1 also gives a density estimate of the unconditional distribution of yt together

with the corresponding normal approximation.2 This suggests that yt is leptokurtic. This

is con¬rmed by Table 8.1, which reports an estimate of the excess of kurtosis with respect

to the normal distribution, which is signi¬cantly positive.

Table 8.1 also reports the Jarque“Bera test for normality and the asymmetry coef¬cients

evidencing that the distributions are negatively skewed, partially due to the period of

analysis, and for all three the null hypothesis of normality is clearly rejected.

These stylised facts can be summarised as follows: non-signi¬cant serial correlation in

the levels of returns; volatility clustering, which implies a signi¬cant and positive serial

correlation in the squares yt2 ; heavy tails and persistence of volatility.

2

The graphs are produced with the Ox software using some of its basic commands and default options. See

also Section 8.7.1.

242 Applied Quantitative Methods for Trading and Investment

Table 8.1 Summary statistics for the daily returns in Figure 8.1. In parentheses: the p-value of

the Jarque“Bera test

FTSE100 CAC40 MIB30

’0.03634 ’0.000219 ’0.04313

Mean

Standard deviation 0.001463 0.015630 0.001689

’0.2574 ’0.223998 ’0.23383

Asymmetry

Excess of kurtosis 1.547015 4.624507 2.268374

Jarque“Bera test 97 (0.00) 106 (0.00) 197 (0.00)

Finally, there is some evidence that stock markets share periods of high volatility. This

suggests that multivariate models will be important.

8.1.2 Pricing contingent assets

Consider an asset C, with expiring date t + „ , which is a function of a generic under-

lying security S. Assume now that S can be described by the following geometric

diffusion process:

dS = µSdt + σ Sdz

so that d log S = (µ ’ σ 2 /2) dt + σ dz. Economists term such an asset C as “contingent”

or “derivative”. A primary example of a derivative is an option, which gives the owner

the ability but not the obligation to trade the underlying security at a given price K, called

the strike price, in the future. European call options are the most known: the owner can

buy the underlying asset at the strike price K only when the call expires, i.e. at date

t + „ . Its value at the expiration date will be:

Ct + „ = max(St+„ ’ K, 0)

Its purchase value at time t is as yet unknown, but can be determined in different ways.

One of these consists of calculating the discounted expected value of the option at time

t + „.

exp(’r„ )ESt+„ |St [max(St+„ ’ K, 0)]

where r is the risk-free interest rate. However, this completely ignores the fact that this is

a risky asset and traders expect higher returns than on riskless assets. This is the reason

why the discounted expected value is not considered by the market as a correct method

to evaluate an asset. To avoid this inconvenience it is opportune to introduce a utility

function into the pricing of options, letting the dealers choose the risk“expected gain

combination they prefer.

It turns out that the added complexity of a utility function can be avoided by assuming

continuous and costless trading. This statement can be shown by creating a particular

portfolio, which by construction is made up of owning θ of the underlying shares and

by borrowing a single contingent asset C. If the investor properly selects θ at each

time, the stochastic component of the process disappears and ensures the portfolio has

a riskless dynamic, making its return a deterministic function of time (see Black and

Stochastic Volatility Models 243

Scholes (1973)). As time passes, the portfolio will have to be continually adjusted to

maintain risklessness, hence the need for continuous costless trading.

The return of this portfolio must be equal to the riskless interest rate r because the port-

folio itself is risk-free, otherwise traders will have an arbitrage opportunity. This condition

is necessary to obtain the stochastic differential equation followed by the contingent asset:

1 ‚ 2C 2 2

‚C ‚C

+ σ S +r S = rC with end condition C = max(S ’ K, 0)

‚t 2 ‚S‚S ‚S

This equation is quite easy to solve and does not depend on the mean parameter µ nor on

the risk preferences of the traders. Whatever the risk preferences may be, the evaluation of

the option does not change. When solving the equation, risk-neutral preferences are used

to simplify calculations. With instantaneous variance σ 2 , the following Black“Scholes

valuation formula is obtained:

√ log(St /K) + (r + σ 2 /2)„

CtBS (σ 2 ) = St (d) ’ Ke’r„ (d ’ σ „ ) d= √

where

σ„

Note that σ 2 is the only unknown parameter: St and r are observed, while „ and K are

usually given by institutional norms. The price depends strictly on σ 2 which is more

important than the drift, as is often the case in ¬nance, so that the price of the option can

be considered an indicator of the volatility of the underlying asset.

Empirically, the Black“Scholes formula can be used in two ways: either by estimating

2

σ (the historical volatility) and then calculating the option price or by using real prices

to determine a value for σ 2 (called implied volatility).

This type of analysis has a considerable shortcoming: the basic assumption that stock

returns follow a geometric diffusion process is a poor one, as indicated in Figure 8.1, and

can affect the valuation formula, reducing the precision of the option pricing. This real-

isation has prompted theoretical work into option pricing theory under various changing

volatility regimes. The leading paper in this ¬eld is Hull and White (1987), to which we

will return later.

8.1.3 Risk evaluation

VaR (Value at Risk) is the maximum amount that is expected to be lost over some target

period, i.e. the maximum likely loss. It is a statistical risk measure and represents a

percentile of the probability distribution of the variable of interest.

Generally speaking, VaR can be analytically de¬ned as follows. Let xt be a random

variable of interest measure and Fx (xt ) its cumulative distribution function:

xt

a = Prob(xt ¤ x t ) = Fx (x t ) = f (xt ) dxt

’∞

VaR is the percentile de¬ned by the relation:

VaRx (1 ’ a) = xt— = Fxt (a)

’1

244 Applied Quantitative Methods for Trading and Investment

’1

where (1 ’ a) is the VaR con¬dence level, for instance 95% or 99%, and Fxt (a) is the

inverse of the cumulative distribution function.

Given its generality, the VaR method can be applied for different types of risk

measurement, such as market risk, credit risk, operational risk and commodity risk (see

Alexander (1996)). Moreover, for its versatility, VaR allows us to obtain an intuitive

risk measure, to de¬ne homogeneous risk measures that permit a comparison among

different ¬nancial instruments, to determine limiting positions and to construct risk-

adjusted pro¬tability measures.

Let us concentrate on its application to market risk. Market risk means the possibility

that an unexpected variation of market factors (interest rates, exchange rates, stock prices,

etc.) causes an increase or a reduction in the value of a position or in the value of a ¬nancial

portfolio. VaR, in this context, is the maximum expected loss of a marketable ¬nancial

instruments portfolio which could be experienced, for a speci¬ed time horizon period and

a speci¬ed con¬dence level.

We now consider a general portfolio model which allows us to set all the hypotheses

discriminating a risk measurement model like VaR in a systematic manner, by paying

particular attention to the role of the volatility.

Let x„ be a random variable which represents the value of a portfolio in a future period

„ . It is de¬ned by the following relation:

N

x„ = wi,t Pi,„

i=1

where the random variables Pi,„ represent the future value of the N assets in the portfolio

and wi,t are their weights at time t. If we suppose that the N assets will be subjected to

K risk market factors χj,„ , the future value of the portfolio can be expressed as a function

of the K stochastic risk factors by the following pricing formula:

N

x„ = wi,t Pi,„ (χ1,„ , . . . , χK,„ )

i=1

The hypothesis characterising the model therefore concerns: the endogenous variable

choice; the pricing formula; the risk factors de¬nition and their distributions; the risk

factors volatility; the risk factors mapping; the con¬dence level and the choice of the

time horizon.

In the literature, the following approaches are suggested to estimate the VaR (Best,

1998): parametric methods; historical simulation; Monte Carlo simulation; stress testing.

Concerning the parametric methods and the Monte Carlo simulation, it is crucial to prop-

erly describe the volatility dynamics of the risk factors to obtain correct estimates of the

VaR (see, for example, Lehar et al. (2002)).

8.2 MODELS OF CHANGING VOLATILITY

Following Cox (1981) and Shephard (1996) models of changing volatility can be use-

fully partitioned into observation-driven and parameter-driven models. They both can be

generally expressed using the following parametric framework:

yt |zt ∼ N (µt , σt2 )

Stochastic Volatility Models 245

where µt is often set equal to zero (as we do not intend to focus on that feature of

the model).

In the ¬rst class, i.e. in observation-driven models, zt is a function of lagged values of

yt . The autoregressive conditional heteroskedasticity (ARCH) models introduced by Engle

(1982) are the most representative example of observation-driven models. They describe

the variance as a linear function of the squares of past observations:

σt2 = ±0 + ±1 yt’1 + · · · + ±p yt’p

2 2

and so the model is de¬ned by the conditional density (one-step-ahead forecast density):

yt |Y t’1 ∼ N (0, σt2 )

where Y t’1 is the set of observations up to time t ’ 1. This allows today™s variance to

depend on the variability of recent observations and then one type of shock alone drives

both the series itself and its volatility.

The use of models described by their one-step-ahead forecast offers remarkable advan-

tages that are worth being highlighted. First, the likelihood expression can be simply

obtained by combining these densities, making the estimation and testing easy to handle,

at least in principle. Second, conditional densities imply the use of conditional moments

which are used widely to specify ¬nance theory, although this is conditional on economic

agents™, if not econometricians™, information. Finally, the observation-driven models par-

allel the autoregressive and moving average ones which are commonly used for models

of changing means.

In the second class, i.e. in parameter-driven (or parameter dynamic latent variable or

state space) models, zt is a function of an unobserved or latent component. The log-

normal stochastic volatility model created by Taylor (1986) is the simplest and best-

known example:

yt |ht ∼ N (0, exp(ht )) ht = ± + βht’1 + ·t , ·t ∼ NID(0, σ· )

2

(8.1)

where ht represents the log-volatility, which is unobserved but can be estimated using the

observations. With respect to the previous class, these models are driven by two types of

shock, one of which in¬‚uences the volatility (i.e. conditional variance equations). These

models parallel the Gaussian state space models of means dealt with by Kalman (1960).

In spite of this, a shortcoming of parameter-driven volatility models is that they gen-

erally lack analytic one-step-ahead forecast densities y t |Y t’1 , unlike the models of the

mean which ¬t into the Gaussian state space form. Hence either an approximation or a

numerically intensive method is required to deal with these models.

Although SV models are harder to handle statistically than the corresponding

observation-driven models, there are still some good reasons for investigating them. We

will see that their properties are easier to ¬nd, understand, manipulate and generalise to

the multivariate case. They also have simpler analogous continuous time representations,

which is important given that much modern ¬nance employs diffusions. An example of

this is the work by Hull and White (1987) which uses a log-normal SV model, replacing

the discrete time AR(1) for ht with an Ornstein“Uhlenbeck process.

246 Applied Quantitative Methods for Trading and Investment

8.3 STOCHASTIC VOLATILITY MODELS

For these models the volatility depends on some unobserved components or a latent struc-

ture. One interpretation for the latent ht is to represent the random and uneven ¬‚ow of new

information, which is very dif¬cult to model directly, into ¬nancial markets (Clark, 1973).

The most popular of these parameter-driven stochastic volatility models, from Taylor

(1986), puts

yt = µt exp(ht /2)

(8.2)

ht = ± + βht’1 + ·t

2

where µt and ·t are two independent Gaussian white noises, with variances 1 and σ· ,

respectively. Due to the Gaussianity of ·t , this model is called a log-normal SV model.

Another possible interpretation for ht is to characterise the regime in which ¬nancial

markets are operating and then it could be described by a discrete valued variable. The

most popular approach to modelling changes in regime is the class of Markov switching

models introduced by Hamilton (1989) in the econometrics literature. In that case the

simplest model is:3

yt = µt exp(ht /2)

(8.3)

ht = ± + βst

where st is a two-state ¬rst-order Markov chain which can take values 0, 1 and is inde-

pendent of µt . The value of the time series st , for all t, depends only on the last value

st’1 , i.e. for i, j = 0, 1:

P (st = j |st’1 = i, st’2 = i, . . .) = P (st = j |st’1 = i) = pij

The probabilities (pij )i,j =0,1 are called transition probabilities of moving from one state

to the other. Obviously, we get that:

p00 + p01 = p10 + p11 = 1

and these transition probabilities are collected in the transition matrix P:

1 ’ p11

p00

P=

1 ’ p00 p11

which fully describes the Markov chain.

A two-state Markov chain can easily be represented by a simple AR(1) process as

follows:

st = (1 ’ p00 ) + (’1 + p00 + p11 )st’1 + vt (8.4)

where vt = st ’ E(st |st’1 , st’2 , . . .). Although vt can take only a ¬nite set of values, on

average vt is zero. The innovation vt is thus a martingale difference sequence. Given the

The representation yt = σst µt , with σ0 = exp(±/2) and σ1 = exp((± + β)/2), is clearly equivalent. To identify

3

the regime 1 as the high volatility regime, we set β > 0.

Stochastic Volatility Models 247

autoregressive representation of the Markov chain, it is possible to rewrite the volatility

equation of model (8.3) in the following way:

ht = ± + βst

= ± + β[(1 ’ p00 ) + (’1 + p00 + p11 )st’1 + vt ]

= ±(2 ’ p00 ’ p11 ) + β(1 ’ p00 ) + (’1 + p00 + p11 )ht’1 + βvt

The SV model with discrete volatility has therefore the same structure as model (8.2) but

with a noise that can take only a ¬nite set of values:

yt = µt exp(ht /2)

(8.5)

ht = a + bht’1 + ωt

We describe the basic properties of both types of model in the following sections.

8.3.1 SV models with continuous volatility

We consider µt and ·t independent, Gaussian white noises. The properties of model (8.2)

are discussed in Taylor (1986, 1994) (see also Shephard (1996)). Broadly speaking, given

the product process nature of the model, these properties are easy to derive, but estimation

is substantially harder than for the corresponding ARCH models.

As ·t is Gaussian, ht is a standard Gaussian autoregression. It will be stationary

(covariance4 and strictly5 ) if |β| < 1 with:

±

µh = E(ht ) =

1’β

2

σ·

σh = Var(ht ) =

2

1 ’ β2

As µt is always stationary, yt will be stationary if and only if ht is stationary, yt being the

product of two stationary processes. Using the properties of the log-normal distribution,

all the moments exist if ht is stationary and in particular the kurtosis is:

E(yt4 )

= 3 exp(σh ) ≥ 3

2

22

(E(yt ))

which shows that the SV model has fatter tails than the corresponding normal distribution

and all the odd moments are zero.

4

A stochastic process yt is covariance stationary if the degree of covariance amongst its observations depends

only on the time gap between them, i.e. Cov(yt , yt+r ) = γ (r) for all t.

5

For some processes there will exist no moments, even in cases where the corresponding uncon-

ditional distributions are perfectly well-behaved. The strict stationarity of yt is then de¬ned as follows:

F (yt+r , yt+r+1 , . . . , yt+r+p ) = F (yt , yt+1 , . . . , yt+p ) for all p and r.

248 Applied Quantitative Methods for Trading and Investment

The dynamic properties of yt are easy to ¬nd. First, as µt is iid, yt is a martingale

difference6 and is a white noise7 if |β| < 1. As ht is a Gaussian AR(1):

Cov(yt2 , yt’r ) = E(yt2 yt’r ) ’ (E(yt2 ))2

2 2

= E(exp(ht + ht’r )) ’ (E(exp(ht )))2

= exp(2µh + σh )(exp(σh β r ) ’ 1)

2 2

and so:

exp(σh β r ) ’ 1 ∼ exp(σh ) ’ 1 r

Cov(yt2 , yt’r )

2 2 2

ρyt2 (r) = = = β

3 exp(σh ) ’ 1 3 exp(σh ) ’ 1

Var(yt2 ) 2 2

Hence, the memory of the yt is de¬ned by the memory of the latent ht , in this case an

AR(1). Moreover, note that if β < 0, ρyt2 (r) can be negative, unlike the ARCH models.

This is the autocorrelation function of an ARMA(1,1) process, thus the SV model behaves

in a manner similar to the GARCH(1,1) model. Finally, note that there is no need for

non-negativity constraints nor for bounded kurtosis constraints on the coef¬cients. This

is a great advantage with respect to GARCH models.

Insights on the dynamic properties of the SV model can also be obtained by squaring

and taking logs, getting:

log(yt2 ) = ht + log(µt2 )

(8.6)

ht = ± + βht’1 + ·t

a linear process, which adds the iid log(µt2 ) to the AR(1) ht . As a result log(yt2 ) ∼

ARMA(1,1). If µt is Gaussian, then log(µt2 ) has a mean of ’1.27 and variance 4.93, but

its distribution is far from being normal, as it is heavily skewed with a long left-hand tail,

caused by taking the logs of very small numbers, an operation which generates outliers.

The autocorrelation function for log(yt2 ) is:

βr

ρlog(yt2 ) (r) =

1 + 4.93/σh

2

8.3.2 SV models with discrete volatility

We consider a two-state Markov chain st independent of µt , which is a Gaussian white

noise.

Assuming stationarity,8 the unconditional probabilities to be in the regime 0(P (st = 0)

= π0 ) or 1 (P (st = 1) = π1 ) are de¬ned as follows:

π0 = p00 π0 + (1 ’ p11 )π1

π1 = (1 ’ p00 )π0 + p11 π1

yt being a martingale difference stipulates that E|yt | < ∞ and that E(yt |yt’1 , yt’2 , . . .) = 0. All martingale

6

differences have zero means and are uncorrelated over time. If the unconditional variance of the martingale

difference is constant over time, then the series is also a white noise.

This means E(yt ) = µ, Var(yt ) = σ 2 and Cov(yt , yt+r ) = 0 for all r = 0. Often µ will be taken to be zero.

7

These unconditional moment conditions are sometimes strengthened to include yt being independent, rather

than uncorrelated, over time. This will be called strong white noise, a special case of which is independent and

identically distributed (iid).

8

An ergodic Markov chain is a covariance stationary process. For some basic properties of Markov chains, see

Hamilton (1994, chap. 22).

Stochastic Volatility Models 249

with π0 + π1 = 1, or in a vector form:

π = Pπ

1π =1

where 1 = (1, 1) . Thus, they are:

1 ’ p11

π0 =

2 ’ p00 ’ p11

1 ’ p00

π1 =

2 ’ p00 ’ p11

From the de¬nition of b in equation (8.5), we can note that when p00 + p11 > 1 the ht

process is likely to persist in its current state and it would be positively serially correlated.

Its unconditional moments are:

E(ht ) = ± + βE(st )

= ± + βπ1

Var(ht ) = β 2 π1 (1 ’ π1 )

Under stationarity,9 as for the SV model with continuous volatility, all the moments exist,

all the odd moments are zero and the kurtosis is:

(π0 + exp(2β)π1 )

E(yt4 )

=3 ≥3

(π0 + exp(β)π1 )2

(E(yt2 ))2

Moreover, as µt is iid, yt is a martingale difference and its dynamic properties are described

by the covariances of squares:

Cov(yt2 , yt’r ) = E(yt2 yt’r ) ’ (E(yt2 ))2

2 2

= E(exp(ht + ht’r )) ’ (exp(±)π0 + exp(± + β)π1 )2

= exp(2±)P (st = 0, st’r = 0) + exp(2± + β)P (st = 0, st’r = 1)

+ exp(2± + β)P (st = 1, st’r = 0) + exp(2± + 2β)P (st = 1, st’r = 1)

’ (exp(±)π0 + exp(± + β)π1 )2

where the vector of unconditional joint probabilities P (st , st’r ) can be computed

as follows:

P (st , st’r ) = P (st |st’r )P (st’r )

= Pr π

9

For this see Francq and Zako¨an (2001) and Francq et al. (2001).

±

250 Applied Quantitative Methods for Trading and Investment

with: ® (1 ’ p ) + »r (1 ’ p ) (1 ’ p11 ) + »r (1 ’ p11 )

11 00

2 ’ p00 ’ p11 2 ’ p00 ’ p11

Pr =

° (1 ’ p ) + »r (1 ’ p ) (1 ’ p ) + » (1 ’ p ) »

r

00 00 00 11

2 ’ p00 ’ p11 2 ’ p00 ’ p11

and » = ’1 + p00 + p11 .

Finally, it is useful to note that ht is itself a Markov chain which can take the values

± and ± + β with the same transition matrix P.

8.4 ESTIMATION

The dif¬culties in estimating SV models lie in the latent nature of the volatility. Inference

may be dif¬cult, because the distribution of yt |Y t’1 is speci¬ed implicitly rather than

explicitly and the likelihood function appears as a multivariate integral the size of which

is equal to the number of observations multiplied by the size of the latent variables, which

is 1 for the described models.

Like most non-Gaussian parameter-driven models, there are many different ways of

performing estimation: some involve estimating or approximating the likelihood, others

use the method of moments procedures (see Ghysels et al. (1996) and Shephard (1996)).

Let us ¬rst of all clearly state the problem of computing the likelihood function for

the general class of parametric dynamic latent variable or non-linear and/or non-Gaussian

state space models.

8.4.1 A general ¬lter for non-Gaussian parameter-driven models

Both SV models (with continuous and discrete volatility) ¬t in the following framework:

yt = φt (ht , µt ; θ ) measurement equation

(8.7)

ht = •t (ht’1 , ·t ; θ ) transition equation

where µt and ·t are independent white noises, with marginal distributions which may

depend on θ , the vector of parameters. Let H t and Y t denote (h1 , h2 , . . . , ht ) and

(y1 , y2 , . . . , yt ) , respectively.

There are serious dif¬culties in computing the likelihood function; in fact, with T the

number of observations, we have:

T

f (Y , H ; θ) = f (yt |Y t’1 , H t ; θ )f (ht |Y t’1 , H t’1 ; θ)

T T

t=1

and the likelihood function is:

T T

≡ f (Y ; θ ) = f (yt |Y ,H ; θ )f (ht |Y

T t’1 t t’1 t’1

T (θ ) ,H ; θ) dht (8.8)

t=1 t=1

which is an integral whose size is equal to the number of observations multiplied by the

dimension of the unobserved variable ht , and thus it is practically unfeasible.

Stochastic Volatility Models 251

It is however possible to derive a general algorithm which allows the formal compu-

tation of the likelihood function by decomposing the calculation of integral (8.8) into a

sequence of integrals of lower dimension.

Let f (ht’1 |Y t’1 ) be the input of the iteration,10 t = 1, 2, . . . , T . First of all, we can

decompose the joint conditional density of ht , ht’1 into the product of the transition

density by the input density:

f (ht , ht’1 |Y t’1 ) = f (ht |ht’1 )f (ht’1 |Y t’1 )

step 1

By marginalisation we obtain the prediction density of ht :

f (ht |Y t’1 ) = f (ht , ht’1 |Y t’1 ) dht’1 = f (ht |ht’1 )f (ht’1 |Y t’1 ) dht’1

step 2

Let us now consider the joint density of yt , ht . It can be decomposed into the product of

the measurement density and the prediction density:

f (yt , ht |Y t’1 ) = f (yt |ht )f (ht |Y t’1 )

step 3

and, again, by marginalisation we obtain the one-step-ahead forecast density of yt :

f (yt |Y t’1 ) = f (yt , ht |Y t’1 ) dht = f (yt |ht )f (ht |Y t’1 ) dht

step 4

which is particularly useful, since by the combination of these densities it is possible

to obtain the likelihood function. Finally, by conditioning we obtain the ¬ltering den-

sity (output):

f (yt , ht |Y t’1 ) f (yt |ht )f (ht |Y t’1 )

f (ht |Y ) = =

t

step 5

f (yt |Y t’1 ) f (yt |ht )f (ht |Y t’1 ) dht

which ends the iteration.

The previous algorithm allows us to obtain several important elements. Step 2 gives

the estimation of ht given all the information available until t ’ 1 (prediction density).

Step 5 provides the estimation of ht given all the information currently available (¬ltering

density). Finally step 4, by providing the one-step-ahead forecast density, allows us to

compute the likelihood function.

Unfortunately, only in very special cases is it possible to obtain analytic recursive

algorithms11 from this general ¬ltering algorithm: the Kalman ¬lter in the Gaussian and

linear case and the Hamilton ¬lter in the Markovian and discrete case.

In the Gaussian and linear cases, the initial input f (h1 |Y 0 ) and the measurement and

transition densities are assumed to be Gaussian and at each step of the algorithm Gaus-

sianity is preserved, then also all the outputs are Gaussian. The normal distribution is

For the ¬rst iteration (t = 1) it is possible to consider the unconditional distribution of ht , f (h1 ). For the

10

sake of simplicity we omit the dependence on the parameter θ .

11

See also Shephard (1994) for another particular case in which ht is set to be a random walk and exp(·t ) a

highly contrived scaled beta distribution. This delivers a one-step-ahead prediction distribution which has some

similarities to the ARCH models.

252 Applied Quantitative Methods for Trading and Investment

completely described by its ¬rst two moments and then the algorithm can be rewritten by

relating means and variances of the different densities involved. This is the Kalman ¬lter.

For the switching regime models introduced by Hamilton (1989), which represent the

Markovian and discrete case, the integrals which appear at steps 2 and 4 become a simple

sum over the possible regimes, and then the whole algorithm is analytically tractable.

In all the other cases, it is necessary to consider approximated solutions or simulation-

based methods. Examples of approximations are the extended Kalman ¬lter (Anderson

and Moore, 1979; Harvey, 1989; Fridman and Harris, 1998), the Gaussian sum

¬lter (Sorenson and Alspach, 1971), the numerical integration (Kitagawa, 1987), the

Monte Carlo integration (Tanizaki and Mariano, 1994, 1998), or the particle ¬lter (Gordon

et al., 1993); Kitagawa, 1996; Pitt and Shephard, 1999a). The simulation-based solutions

are certainly more time-consuming and demanding in terms of computing, but they are

de¬nitely more general. We will see these methods in greater detail later.

However, for the two presented models (8.2) and (8.3), the general ¬lter introduced here

is useful for estimation. In fact, for the linearised version (8.6), the Kalman ¬lter allows

a quasi-maximum likelihood estimation of the parameters and the discrete version (8.3)

is a particular case of switching regime models for which the Hamilton ¬lter gives the

likelihood function.

8.4.1.1 The Kalman ¬lter for quasi-maximum likelihood (QML) estimation

of continuous SV models

We can consider the log-transformation (8.6) of the continuous SV model. As log(µt2 ) ∼

iid, we obtain a linear state space model.

ˆ

Let LY „ = (log(y1 ), log(y2 ), . . . , log(y„ )) , ht/„ = E(ht |LY „ ) = E(ht |Y „ ) and Qt/„ =

2 2 2

MSE(ht |LY „ ) = MSE(ht |Y „ ). The Kalman ¬lter (see, for example,12 Harvey (1989))

computes these quantities recursively for t = 1, . . . , T :

ˆ ˆ

ht/t’1 = ± + β ht’1/t’1

Qt/t’1 = β 2 Qt’1/t’1 + σ·

2

ˆ

et/t’1 = log(yt2 ) ’ ht/t’1

Ft/t’1 = Qt/t’1 + π 2 /2

ˆ ˆ

ht/t = ht/t’1 + Kt et/t’1

Qt/t = (1 ’ Kt )2 Qt/t’1

’1

where Kt = Qt/t’1 Ft/t’1 is the Kalman gain. However, as log(µt2 ) is not Gaussian, the

Kalman ¬lter can be used to provide the best linear unbiased estimator of ht given Y t .

Moreover, if (8.6) were a Gaussian state space model, the Kalman ¬lter would provide

the exact likelihood function. In fact, a by-product of the ¬lter are the innovations et/t’1 ,

12

See also Carraro and Sartore (1987).

Stochastic Volatility Models 253

which are the one-step-ahead forecast errors and their corresponding mean square errors,

Ft/t’1 . Together they deliver the likelihood (ignoring constants):

T T 2

et/t’1

1 1

T (θ ) = ’ log Ft/t’1 ’

Ft/t’1

2 2

t=1 t=1

As the state space is linear but not Gaussian, the ¬lter gives a quasi-likelihood func-

ˆ

tion which can be used to obtain a consistent estimator θ and asymptotically normal

inference (see Ruiz (1994)).

This way of estimating ht is used by Melino and Turnbull (1990), after estimating θ by

the generalised method of moments (see Section 8.4.3.1). Harvey et al. (1994) examine

the QML estimator.

8.4.1.2 The Hamilton ¬lter for maximum likelihood estimation of discrete SV models

The discrete SV model (8.3) is a non-linear and non-Gaussian state space model. In

the two-regimes case, the transition equation can be written in a linear form (see (8.5))

and the measurement equation can be linearised by the log transformation, but both

the equations are non-Gaussian. However, the joint process (yt , ht ) is Markovian and

thus the general ¬lter presented in Section 8.4.1 gives an analytic recursion, since the

integrals become simple sums over the possible values of ht . The input is the ¬ltered

probability13 P (ht’1 |Y t’1 ) and the algorithm gives the prediction probability, the one-

step-ahead forecast density and the subsequent ¬ltered probability:

P (ht |Y t’1 ) = P (ht |ht’1 )P (ht’1 |Y t’1 )

ht’1

f (yt |Y t’1 ) = f (yt |ht )P (ht |Y t’1 )

ht

f (yt |ht )P (ht |Y t’1 )

P (ht |Y ) =

t

f (yt |ht )P (ht |Y t’1 )

ht

The combination of the one-step-ahead forecast densities:

T

= f (yt |Y t’1 )

T (θ )

t=1

provides the likelihood function, the maximisation of which gives the maximum likelihood

estimators of the parameters.

The initial probability P (h0 |Y 0 ) can be taken equal to the unconditional (ergodic) probability P (h0 ) = π .

13

254 Applied Quantitative Methods for Trading and Investment

8.4.2 A general smoother for non-Gaussian parameter-driven models

We might also want to obtain the estimation of ht given all the information available, that

is conditional on Y T . Such a procedure is called smoothing and as before it is possible

to derive a formal backward algorithm which delivers the smoothed densities f (ht |Y T ).

Let f (ht+1 |Y T ) be the input of the iteration,14 t = T ’ 1, T ’ 2, . . . , 2, 1. We can

decompose the joint density of ht+1 , ht , conditional on the information set Y t , in the

product of the transition density by the ¬ltered density (available from the ¬lter):

f (ht+1 , ht |Y t ) = f (ht+1 |ht )f (ht |Y t )

step 1

By conditioning with the prediction density obtained from the ¬lter, we obtain the fol-

lowing conditional density:

f (ht+1 , ht |Y t )

f (ht |ht+1 , Y t ) =

step 2

f (ht+1 |Y t )

The joint density of ht+1 , ht , conditional on the information set Y T , is given by the product

of the conditional density f (ht |ht+1 , Y T ) by the input of the algorithm f (ht+1 |Y T ). The

T T

information set ht+1 , Y T is included in the information set ht+1 , Y t , µt+1 , ·t+2 , where

µt+1 = (µt+1 , . . . , µT ) and ·t+2 = (·t+2 , . . . , ·T ) . Given that µt+1 , ·t+2 is independent of

T T T T

ht , ht+1 , Y t , we can conclude that f (ht |ht+1 , Y T ) = f (ht |ht+1 , Y t ) (computed at step 2)

and then:

f (ht+1 , ht |Y T ) = f (ht |ht+1 , Y T )f (ht+1 |Y T ) = f (ht |ht+1 , Y t )f (ht+1 |Y T )

step 3

Finally, by marginalisation we obtain the smoothed density of ht (output):

f (ht |Y T ) = f (ht+1 , ht |Y T ) dht+1 = f (ht |ht+1 , Y t )f (ht+1 |Y T ) dht+1

step 4

Again, only in the linear and Gaussian case, and in the Markovian and discrete case is

it possible to obtain an analytic backward recursion: the Kalman smoother and the Kim

smoother (Kim, 1994).

8.4.2.1 The Kalman smoother for continuous SV models

ˆ

Let ht+1/T = E(ht+1 |LY T ) = E(ht+1 |Y T ) and Qt+1/T = MSE(ht+1 |LY T ) = MSE

(ht+1 |Y T ). The Kalman smoother15 computes these quantities recursively for t =

T ’ 1, T ’ 2, . . . , 2, 1:

ˆ ˆ ˆ ˆ

ht/T = ht/t + βQt/t Q’1 (ht+1/T ’ ht+1/t )

t+1/t

Qt/T = Qt/t + β 2 Q2 Q’2 (Qt+1/T ’ Qt+1/t )

t/t t+1/t

ˆ ˆ

where ht/t , Qt/t , ht+1/t , Qt+1/t are stored from the Kalman ¬lter.

For the ¬rst iteration (t = T ’ 1), the input is simply the ¬nal output of the ¬lter f (hT |Y T ).

14

15

See also de Jong (1989).

Stochastic Volatility Models 255

For the log transformation of the continuous SV model (8.6), the Kalman smoother

is useful in estimating the unobserved log-volatility, in fact it provides the best linear

unbiased estimator of ht given (y1 , y2 , . . . , yT ) .

8.4.2.2 The Kim smoother for discrete SV models

The input is the smoothed probability P (ht+1 |Y T ) and the recursion is simply:

P (ht+1 |ht )P (ht |Y t )P (ht+1 |Y T )

P (ht |Y ) =

T

P (ht+1 |Y t )

st+1

where P (ht |Y t ) and P (ht+1 |Y t ) are stored from the Hamilton ¬lter.

8.4.3 Other estimation methods for continuous SV models

For the discrete SV model the Hamilton ¬lter allows us to obtain the maximum likelihood

estimator of the parameters. On the contrary, for the continuous SV models, the Kalman

¬lter provides only an approximation of the likelihood function. Let us review some other

possible estimation methods useful for the continuous SV model.

Like most non-Gaussian parameter-driven models, there are many different ways

to perform estimation. Some involve estimating the likelihood; others use method of

moments procedures.

8.4.3.1 Method of moments

The simplest approach is the method of moments, based on matching empirical and

theoretical moments. In the SV case there are many possible moments to use in estimating

the parameters of the model. This is because yt2 behaves like an ARMA(1,1) model and

moving average models do not allow suf¬cient statistics which are of a smaller dimension

than T . This suggests that the use of a ¬nite number of moment restrictions is likely to

lead to loss of information. Examples include those based on yt2 , yt4 , yt2 yt’r , although there

2

are many other possibilities. As a result, we may well want to use more moments than

there are parameters to estimate, implying that they will have to be pooled. A reasonably

sensible way of doing this is via the Generalised Method of Moments (GMM).

We can consider, for example, the vector gT of the ¬rst r autocovariances of yt2 or of

log(yt2 ) as moment constraints. There are more moments than parameters and the issue

is how to weight all the available information. The GMM approach of Hansen (1992)

suggests minimising the quadratic form gT WT gT by varying the parameters θ and the

weighting matrix WT should re¬‚ect the relative importance given to matching each of

the chosen moments. Applications of this method to SV models are the seminal work of

Melino and Turnbull (1990) and the extensive study of Andersen and Sørensen (1996).

The main advantage of the GMM approach comes from the fact that it does not require

distributional assumptions. However, this is not useful for the SV model since it is a fully

speci¬ed parametric model. On the contrary, as argued by Shephard (1996), there are a

number of drawbacks to the GMM estimation of the SV model. First of all, GMM can only

be used if ht is stationary; if β is close to one (as we will ¬nd for many high-frequency

¬nancial data sets), we can expect GMM to work poorly. Second, parameter estimates

256 Applied Quantitative Methods for Trading and Investment

are not invariant to the parameterisation and the model (8.2) is not fundamentally more

interesting than

yt = µt γ exp(ht /2)

ht = βht’1 + ·t

Third, as already observed, the squares yt2 behave like an ARMA(1,1) model; if σ· is 2

small (as we will ¬nd in practice), ρyt2 (r) will be small but positive for many r. This

implies that for many series the number of moments to be considered will have to be

very high to capture the low correlation in the volatility process. Finally, GMM does not

deliver an estimate (¬ltered or smoothed) of ht , consequently a second form of estimation

will be required.

The GMM and QML approaches are the simplest way of estimating the SV models

and they are about equally ef¬cient, with the relative performance being dependent on

the speci¬c parameter values (see Andersen and Sørensen (1997)).

8.4.3.2 Simulation-based methods

All the other estimation approaches are based on simulation techniques. In the last 10

years there has been a growing interest in simulation-based16 methods which propose

several ways of resolving the inference problem for this class of models (see Billio (1999,

2002b)). In fact, it is clear that one can easily recursively simulate (path simulations) from

the system (8.2) for any given value of parameters, θ .

A ¬rst approach relies on simulation-based methods which are relatively simple to

implement, but which are less ef¬cient than the maximum likelihood approach: see, for

example, the simulated method of moments (Duf¬e and Singleton, 1993), the indirect

inference method (Gouri´ roux et al., 1993) or the ef¬cient method of moments (Gallant

e

and Tauchen, 1996; Gallant et al., 1997). A second approach considers the problem

of the computation (or of the approximation) of the likelihood and then of the maxi-

mum likelihood estimator through importance sampling methods (Danielsson and Richard,

1993; Danielsson, 1994; Durbin and Koopman, 1997). In a Bayesian framework, a third

approach considers Markov Chain Monte Carlo (MCMC) techniques based on the data

augmentation principle, which yields samples out of the joint posterior distribution of the

latent volatility and all model parameters, and allows the parameter estimates and the latent

volatility dynamics to be obtained (Jacquier et al., 1994; Kim et al., 1998; Chib et al.,

2002). Finally, a fourth approach utilises MCMC methods to compute (or approximate) the

maximum likelihood estimator (see the simulated expectation maximisation (Shephard,

1993; Geyer, 1994, 1996; Billio et al., 1998).

In practice, the choice between these different simulation-based approaches depends on

several criteria, such as ef¬ciency and computing time. Unfortunately, in general there is a

trade-off between these criteria. Methods like the simulated maximum likelihood and the

simulated likelihood ratio have several advantages in the estimation of SV models. Since

they are likelihood methods, the classical theory of maximum likelihood carries over to

the simulated case and standard likelihood ratio tests can be constructed. MCMC-based

approaches are certainly more time-consuming, but also allow estimation of the latent

volatility dynamics by simulating from the smoothing/posterior distribution of ht .

16

Simulation techniques make use of sequences of pseudo-random numbers which are generated by a computer

procedure.

Stochastic Volatility Models 257

Let us brie¬‚y introduce part of these methods and their application to SV models.

8.4.3.2.1 Indirect inference approach

The so-called indirect inference methodology was recently introduced in the literature

by Smith (1993), Gouri´ roux et al. (1993), Gallant and Tauchen (1996) for a simulation-

e

based inference on generally intractable structural models through an auxiliary model,

conceived as easier to handle. This methodology allows the use of somewhat misspeci¬ed

auxiliary models, since the simulation process in the well-speci¬ed structural model and

the calibration of the simulated paths against the observed one through the same auxiliary

model will provide an automatic misspeci¬cation bias correction. There are several ways

of implementing this idea.17

The original approach is the indirect inference method of Gouri´ roux et al. (1993).

e

Consider an auxiliary model fa (yt |Y ; π ) for the observed data (for example18 the

t’1

general linear state space model obtained by the log transformation (8.6)). Let πT = ˆ

T

T (Y ) denote the QML estimator of π based on fa as a function T (·) of the observed

T

data set Y . The indirect inference estimator of structural parameters θ is given by:

ˆ

θII = arg min[πT ’ πNT (θ )] WT [πT ’ πNT (θ )]

ˆ ˜ ˆ ˜

θ

˜

where WT is a weighting matrix and πNT (θ ) is the π estimator obtained on a simulated

˜ for a given value of θ (i.e. that is given by the binding function πNT (θ ) =

˜

NT

path of Y

˜ NT ), which is approximated by NT (Y NT ) for large N ). This approach may

˜

lim NT (Y

N’∞

˜

be very computationally demanding as one needs to evaluate the binding function πNT (θ )

for each value of θ appearing in the numerical optimisation algorithm.