and the transactions data consist of { ti , Ni , Di , Si } for the ith price change. The

PCD model is concerned with the joint analysis of ( ti , Ni , Di , Si ).

Remark: Focusing on transactions associated with a price change can reduce

the sample size dramatically. For example, consider the intraday data of IBM stock

from November 1, 1990 to January 31, 1991. There were 60,265 intraday trades, but

only 19,022 of them resulted in a price change. In addition, there is no diurnal pattern

in time durations between price changes.

To illustrate the relationship among the price movements of all transactions and

those of transactions associated with a price change, we consider the intraday trad-

ings of IBM stock on November 21, 1990. There were 726 transactions on that day

during the normal trading hours, but only 195 trades resulted in a price change. Fig-

ure 5.14 shows the time plot of the price series for both cases. As expected, the price

series are the same.

The PCD model decomposes the joint distribution of ( ti , Ni , Di , Si ) given Fi’1

as

f ( ti , Ni , Di , Si | Fi’1 )

= f (Si | Di , Ni , ti , Fi’1 ) f (Di | Ni , ti , Fi’1 ) f (Ni | ti , Fi’1 ) f ( ti | Fi’1 ).

(5.47)

This partition enables us to specify suitable econometric models for the conditional

distributions and, hence, to simplify the modeling task. There are many ways to

specify models for the conditional distributions. A proper speci¬cation might depend

on the asset under study. Here we employ the speci¬cations used by McCulloch and

Tsay (2000), who use generalized linear models for the discrete-valued variables and

a time series model for the continuous variable ln( ti ).

For the time duration between price changes, we use the model

ln( ti ) = β0 + β1 ln( ti’1 ) + β2 Si’1 + σ i , (5.48)

where σ is a positive number and { i } is a sequence of iid N (0, 1) random variables.

This is a multiple linear regression model with lagged variables. Other explanatory

variables can be added if necessary. The log transformation is used to ensure the

positiveness of time duration.

209

THE PCD MODEL

(a) All transactions

•

112.5 113.0 113.5 114.0

•••

•

••••••• •

•• ••

•• ••

•

••

•• ••

•••• • •••• • •••• •••

• •• • •

••••••

• ••• • ••••• ••••••• •• ••• •

•• • •

price

•• •••• ••••

• •••• • •••• ••••• ••

•••• • • •

• ••••

•

••• ••••

•• •• •••

•• •• •

• ••••• •

•• ••• •••

• • • •••••

• •• •• •• • •••

••

••• •••••• •••••

• •• • • •••• •••

••

•• •• •• • ••••••• •

•• •• • ••••

•

• ••••••• •

• •• •• • • •• • •••••• •• •

• •••

••• • ••

• ••••••••• ••••• •

• •• • •• •

35000 40000 45000 50000 55000

seconds

(b) Transactions with a price change

•

112.5 113.0 113.5 114.0

•••

•• • •

•

•• ••

••• ••

•••• • ••• • • • • •

•

••••••

•• • •• ••• • •• •• •

• •

price

• •••• •••

•• •• •• •• ••• ••

•• •••

• ••••

• •• •••

• • •

• •• •• • ••

• •••

• •• • •

•• •• • • •••• • •

• •• • • • • • •• • ••••

•• • • •• •

• • • •• • • • • • •

•

• • • •• • • •• •• •

35000 40000 45000 50000 55000

seconds

Figure 5.14. Time plots of the intraday transaction prices of IBM stock on November 21,

1990: (a) all transactions, and (b) transactions that resulted in a price change.

The conditional model for Ni is further partitioned into two parts because empir-

ical data suggest a concentration of Ni at 0. The ¬rst part of the model for Ni is the

logit model

p(Ni = 0 | ti , Fi’1 ) = logit[±0 + ±1 ln( ti )], (5.49)

where logit(x) = exp(x)/[1 + exp(x)], whereas the second part of the model is

exp[γ0 + γ1 ln( ti )]

Ni | (Ni > 0, ti , Fi’1 ) ∼ 1 + g(»i ), »i = , (5.50)

1 + exp[γ0 + γ1 ln( ti )]

where ∼ means “is distributed as,” and g(») denotes a geometric distribution with

parameter », which is in the interval (0, 1).

The model for direction Di is

Di | (Ni , ti , Fi’1 ) = sign(µi + σi ), (5.51)

where is a N (0, 1) random variable, and

210 HIGH-FREQUENCY DATA

µi = ω0 + ω1 Di’1 + ω2 ln( ti )

4

ln(σi ) = β Di’ j = β| Di’1 + Di’2 + Di’3 + Di’4 |.

j=1

In other words, Di is governed by the sign of a normal random variable with mean µi

and variance σi2 . A special characteristic of the prior model is the function for ln(σi ).

For intraday transactions, a key feature is the price reversal between consecutive

price changes. This feature is modeled by the dependence of Di on Di’1 in the

mean equation with a negative ω1 parameter. However, there exists occasional local

trend in the price movement. The previous variance equation allows for such a local

trend by increasing the uncertainty in the direction of price movement when the past

data showed evidence of a local trend. For a normal distribution with a ¬xed mean,

increasing its variance makes a random draw have the same chance to be positive

and negative. This in turn increases the chance for a sequence of all positive or all

negative draws. Such a sequence produces a local trend in price movement.

To allow for different dynamics between positive and negative price movements,

we use different models for the size of a price change. Speci¬cally, we have

Si | (Di = ’1, Ni , ti , Fi’1 ) ∼ p(»d,i ) + 1, with (5.52)

ln(»d,i ) = ·d,0 + ·d,1 Ni + ·d,2 ln( ti ) + ·d,3 Si’1

Si | (Di = 1, Ni , ti , Fi’1 ) ∼ p(»u,i ) + 1, with (5.53)

ln(»u,i ) = ·u,0 + ·u,1 Ni + ·u,2 ln( ti ) + ·u,3 Si’1 ,

where p(») denotes a Poisson distribution with parameter », and 1 is added to the

size because the minimum size is 1 tick when there is a price change.

The speci¬ed models in Eqs. (5.48)“(5.53) can be estimated jointly by either the

maximum likelihood method or the Markov Chain Monte Carlo methods. Based

on Eq. (5.47), the models consist of six conditional models that can be estimated

separately.

Example 5.5. Consider the intraday transactions of IBM stock on November

21, 1990. There are 194 price changes within the normal trading hours. Figure 5.15

shows the histograms of ln( ti ), Ni , Di , and Si . The data for Di are about equally

distributed between “upward” and “downward” movements. Only a few transactions

resulted in a price change of more than 1 tick; as a matter of fact, there were seven

changes with two ticks and one change with three ticks. Using Markov Chain Monte

Carlo (MCMC) methods (see Chapter 10), we obtained the following models for the

data. The reported estimates and their standard deviations are the posterior means

and standard deviations of MCMC draws with 9500 iterations. The model for the

time duration between price changes is

ln( ti ) = 4.023 + 0.032 ln( ti’1 ) ’ 0.025Si’1 + 1.403 i ,

211

THE PCD MODEL

0 10 20 30 40 50

100 150

50

0

0 2 4 6 1.0 1.5 2.0 2.5 3.0

log(duration) size, in ticks

20 40 60 80 100

120

0 20 40 60 80

0

-1.0 -0.5 0.0 0.5 1.0 0 5 10 15 20

direction number of trades

Figure 5.15. Histograms of intraday transactions data for IBM stock on November 21, 1990:

(a) log durations between price changes, (b) direction of price movement, (c) size of price

change measured in ticks, and (d) number of trades without a price change.

where standard deviations of the coef¬cients are 0.415, 0.073, 0.384, and 0.073,

respectively. The ¬tted model indicates that there was no dynamic dependence in the

time duration. For the Ni variable, we have

Pr (Ni > 0 | ti , Fi’1 ) = logit[’0.637 + 1.740 ln( ti )],

where standard deviations of the estimates are 0.238 and 0.248, respectively. Thus,

as expected, the number of trades with no price change in the time interval (ti’1 , ti )

depends positively on the length of the interval. The magnitude of Ni when it is

positive is

exp[0.178 ’ 0.910 ln( ti )]

Ni | (Ni > 0, ti , Fi’1 ) ∼ 1 + g(»i ), »i = ,

1 + exp[0.178 ’ 0.910 ln( ti )]

where standard deviations of the estimates are 0.246 and 0.138, respectively. The

negative and signi¬cant coef¬cient of ln( ti ) means that Ni is positively related to

the length of the duration ti because a large ln( ti ) implies a small »i , which

in turn implies higher probabilities for larger Ni ; see the geometric distribution in

Eq. (5.27).

212 HIGH-FREQUENCY DATA

The ¬tted model for Di is

µi = 0.049 ’ 0.840Di’1 ’ 0.004 ln( ti )

ln(σi ) = 0.244| Di’1 + Di’2 + Di’3 + Di’4 |,

where standard deviations of the parameters in the mean equation are 0.129, 0.132,

and 0.082, respectively, whereas that for the parameter in the variance equation is

0.182. The price reversal is clearly shown by the highly signi¬cant negative coef-

¬cient of Di’1 . The marginally signi¬cant parameter in the variance equation is

exactly as expected. Finally, the ¬tted models for the size of a price change are

ln(»d,i ) = 1.024 ’ 0.327Ni + 0.412 ln( ti ) ’ 4.474Si’1

ln(»u,i ) = ’3.683 ’ 1.542Ni + 0.419 ln( ti ) + 0.921Si’1 ,

where standard deviations of the parameters for the “down size” are 3.350, 0.319,

0.599, and 3.188, respectively, whereas those for the “up size” are 1.734, 0.976,

0.453, and 1.459. The interesting estimates of the prior two equations are the negative

estimates of the coef¬cient of Ni . A large Ni means there were more transactions in

the time interval (ti’1 , ti ) with no price change. This can be taken as evidence of no

new information available in the time interval (ti’1 , ti ). Consequently, the size for

the price change at ti should be small. A small »u,i or »d,i for a Poisson distribution

gives precisely that.

In summary, granted that a sample of 194 observations in a given day may not

contain suf¬cient information about the trading dynamic of IBM stock, but the ¬tted

models appear to provide some sensible results. McCulloch and Tsay (2000) extend

the PCD model to a hierarchical framework to handle all the data of the 63 trad-

ing days between November 1, 1990 and January 31, 1991. Many of the parameter

estimates become signi¬cant in this extended sample, which has more than 19,000

observations. For example, the overall estimate of the coef¬cient of ln( ti’1 ) in the

model for time duration ranges from 0.04 to 0.1, which is small, but signi¬cant.

Finally, using transactions data to test microstructure theory often requires a care-

ful speci¬cation of the variables used. It also requires a deep understanding of the

way by which the market operates and the data are collected. However, ideas of the

econometric models discussed in this chapter are useful and widely applicable in

analysis of high-frequency data.

APPENDIX A. REVIEW OF SOME PROBABILITY DISTRIBUTIONS

Exponential distribution

A random variable X has an exponential distribution with parameter β > 0 if its

probability density function (pdf) is given by

213

THE PCD MODEL

±

1 ’x/β

e if x ≥ 0

f (x | β) = β

0 otherwise.

Denoting such a distribution by X ∼ exp(β), we have E(X ) = β and Var(X ) = β 2 .

The cumulative distribution function (CDF) of X is

if x < 0

0

F(x | β) =

1 ’ e’x/β if x ≥ 0.

When β = 1, X is said to have a standard exponential distribution.

Gamma function

For κ > 0, the gamma function (κ) is de¬ned by

∞

x κ’1 e’x d x.

(κ) =

0

The most important properties of the gamma function are:

1. For any κ > 1, (κ) = (κ ’ 1) (κ ’ 1).

2. For any positive integer m, (m) = (m ’ 1)!.

√

3. ( 1 ) = π.

2

The integration

y

x κ’1 e’x d x,

(y | κ) = y>0

0

is an incomplete gamma function. Its values have been tabulated in the literature.

Computer programs are now available to evaluate the incomplete gamma function.

Gamma distribution

A random variable X has a Gamma distribution with parameter κ and β (κ > 0,

β > 0) if its pdf is given by

±

1

x κ’1 e’x/β if x ≥ 0

κ (κ)

β

f (x | κ, β) =

0 otherwise.

By changing variable y = x/β, one can easily obtain the moments of X :

∞ ∞

1

x κ+m’1 e’x/β d x

E(X m ) = x m f (x | κ, β)d x =

β κ (κ)

0 0

214 HIGH-FREQUENCY DATA

∞

βm β m (κ + m)

κ+m’1 ’y

= dy = .

y e

(κ) (κ)

0

In particular, the mean and variance of X are E(X ) = κβ and Var(X ) = κβ 2 . When

β = 1, the distribution is called a standard Gamma distribution with parameter κ.

We use the notation G ∼ Gamma(κ) to denote that G follows a standard Gamma

distribution with parameter κ. The moments of G are

(κ + m)

E(G m ) = , m > 0. (5.54)

(κ)

Weibull distribution

A random variable X has a Weibull distribution with parameters ± and β (± > 0,

β > 0) if its pdf is given by

± ±’1 ’(x/β)±

if x ≥ 0

β± x e

f (x | ±, β) =

if x < 0,

0

where β and ± are the scale and shape parameters of the distribution. The mean and

variance of X are

2

1 2 1

E(X ) = β 1+ , Var(X ) = β 1+ ’ 1+

2

± ± ±

and the CDF of X is

if x < 0

0

F(x | ±, β) = ±

1 ’ e’(x/β) if x ≥ 0.

When ± = 1, the Weibull distribution reduces to an exponential distribution.

De¬ne Y = X/[β (1 + ± )]. We have E(Y ) = 1 and the pdf of Y is

1

± ± ±

1 1

y ±’1 exp

± 1+ ’ 1+ if y ≥ 0

y

f (y | ±) = ± ±

0 otherwise,

(5.55)

where the scale parameter β disappears due to standardization. The CDF of the stan-

dardized Weibull distribution is

±

if y < 0

0

±

F(y | ±) = 1

1 ’ exp ’ 1+ if y > 0,

y

±

and we have E(Y ) = 1 and Var(Y ) = (1 + ± )/[ (1 + ± )]2 ’ 1. For a duration

2 1

model with Weibull innovations, the prior pdf is used in the maximum likelihood

estimation.

215

THE PCD MODEL

Generalized Gamma distribution

A random variable X has a generalized Gamma distribution with parameter ±, β, κ

(± > 0, β > 0, and κ > 0) if its pdf is given by

±

x±

±x κ±’1

exp ’ if x ≥ 0

f (x | ±, β, κ) = β κ± (κ) β

0 otherwise,

where β is a scale parameter, and ± and κ are shape parameters. This distribution

can be written as

±

X

G= ,

β

where G is a standard Gamma random variable with parameter κ. The pdf of X can

be obtained from that of G by the technique of changing variables. Similarly, the

moments of X can be obtained from that of G in Eq. (5.54) by

(κ + m ) β m (κ + ±)

m

±

E(X ) = E[(βG ) ] = β E(G )=β = .

m 1/± m m m/± m

(κ) (κ)

When κ = 1, the generalized Gamma distribution reduces to that of a Weibull

distribution. Thus, the exponential and Weibull distributions are special cases of the

generalized Gamma distribution.

The expectation of a generalized Gamma distribution is E(X ) = β (κ +

± )/ (κ). In duration models, we need a distribution with unit expectation. There-

1

fore, de¬ning a random variable Y = »X/β, where » = (κ)/ (κ + ± ), we have1

E(Y ) = 1 and the pdf of Y is

± κ±’1

±y y±

exp ’ if y > 0

f (y | ±, κ) = »κ± (κ) » (5.56)

0 otherwise,

where again the scale parameter β disappears and » = (κ)/ (κ + ± ).

1

APPENDIX B. HAZARD FUNCTION

A useful concept in modeling duration is the Hazard function implied by a distribu-

tion function. For a random variable X , the survival function is de¬ned as

S(x) ≡ P(X > x) = 1 ’ P(X ¤ x) = 1 ’ CDF(x), x > 0,

which gives the probability that a subject, which follows the distribution of X , sur-

vives at the time x. The hazard function (or intensity function) of X is then de¬ned

216 HIGH-FREQUENCY DATA

by

f (x)

h(x) = (5.57)

S(x)

where f (.) and S(.) are the pdf and survival function of X , respectively.

Example 5.6. For the Weibull distribution with parameters ± and β, the sur-

vival function and hazard function are:

±

± ±’1

x

S(x | ±, β) = exp ’ , h(x | ±, β) = , x > 0.

x

β±

β

In particular, when ± = 1, we have h(x | β) = 1/β. Therefore, for an exponential

distribution, the hazard function is constant. For a Weibull distribution, the hazard is

a monotone function. If ± > 1, then the hazard function is monotonously increas-

ing. If ± < 1, the hazard function is monotonously decreasing. For the generalized

Gamma distribution, the survival function and hence, the hazard function involve the

incomplete Gamma function. Yet the hazard function may exhibit various patterns,

including U shape or inverted U shape. Thus, the generalized Gamma distribution

provides a ¬‚exible approach to modeling the duration of stock transactions.

For the standardized Weibull distribution, the survival and hazard functions are

±

1

S(y | ±) = exp ’ 1+ ,

y

±

±

1

y ±’1 ,

h(y | ±) = ± 1+ y > 0.

±

APPENDIX C. SOME RATS PROGRAMS FOR DURATION MODELS

The data used are adjusted time durations of intraday transactions of IBM stock from

November 1 to November 9, 1990. The ¬le name is “ibm1to5.dat” and it has 3534

observations.

A. Program for Estimating a WACD(1, 1) Model

all 0 3534:1

open data ibm1to5.dat

data(org=obs) / x r1

set psi = 1.0

nonlin a0 a1 b1 al

frml gvar = a0+a1*x(t-1)+b1*psi(t-1)

frml gma = %LNGAMMA(1.0+1.0/al)

frml gln =al*gma(t)+log(al)-log(x(t)) $

+al*log(x(t)/(psi(t)=gvar(t)))-(exp(gma(t))*x(t)/psi(t))**al

217

THE PCD MODEL

smpl 2 3534

compute a0 = 0.2, a1 = 0.1, b1 = 0.1, al = 0.8

maximize(method=bhhh,recursive,iterations=150) gln

set fv = gvar(t)

set resid = x(t)/fv(t)

set residsq = resid(t)*resid(t)

cor(qstats,number=20,span=10) resid

cor(qstats,number=20,span=10) residsq

B. Program for Estimating a GACD(1, 1) Models

all 0 3534:1

open data ibm1to5.dat

data(org=obs) / x r1

set psi = 1.0

nonlin a0 a1 b1 al ka

frml cv = a0+a1*x(t-1)+b1*psi(t-1)

frml gma = %LNGAMMA(ka)

frml lam = exp(gma(t))/exp(%LNGAMMA(ka+(1.0/al)))

frml xlam = x(t)/(lam(t)*(psi(t)=cv(t)))

frml gln =-gma(t)+log(al/x(t))+ka*al*log(xlam(t))-(xlam(t))**al

smpl 2 3534

compute a0 = 0.238, a1 = 0.075, b1 = 0.857, al = 0.5, ka = 4.0

nlpar(criterion=value,cvcrit=0.00001)

maximize(method=bhhh,recursive,iterations=150) gln

set fv = cv(t)

set resid = x(t)/fv(t)

set residsq = resid(t)*resid(t)

cor(qstats,number=20,span=10) resid

cor(qstats,number=20,span=10) residsq

C. A program for estimating a Tar-WACD(1, 1) model. The threshold 3.79 is

prespeci¬ed.

all 0 3534:1

open data ibm1to5.dat

data(org=obs) / x rt

set psi = 1.0

nonlin a1 a2 al b0 b2 bl

frml u = ((x(t-1)-3.79)/abs(x(t-1)-3.79)+1.0)/2.0

frml cp1 = a1*x(t-1)+a2*psi(t-1)

frml gma1 = %LNGAMMA(1.0+1.0/al)

frml cp2 = b0+b2*psi(t-1)

frml gma2 = %LNGAMMA(1.0+1.0/bl)

frml cp = cp1(t)*(1-u(t))+cp2(t)*u(t)

frml gln1 =al*gma1(t)+log(al)-log(x(t)) $

+al*log(x(t)/(psi(t)=cp(t)))-(exp(gma1(t))*x(t)/psi(t))**al

frml gln2 =bl*gma2(t)+log(bl)-log(x(t)) $

+bl*log(x(t)/(psi(t)=cp(t)))-(exp(gma2(t))*x(t)/psi(t))**bl

frml gln = gln1(t)*(1-u(t))+gln2(t)*u(t)

smpl 2 3534

compute a1 = 0.2, a2 = 0.85, al = 0.9

218 HIGH-FREQUENCY DATA

compute b0 = 1.8, b2 = 0.5, bl = 0.8

maximize(method=bhhh,recursive,iterations=150) gln

set fv = cp(t)

set resid = x(t)/fv(t)

set residsq = resid(t)*resid(t)

cor(qstats,number=20,span=10) resid

cor(qstats,number=20,span=10) residsq

EXERCISES

1. Let rt be the log return of an asset at time t. Assume that {rt } is a Gaussian white

noise series with mean 0.05 and variance 1.5. Suppose that the probability of a

trade at each time point is 40% and is independent of rt . Denote the observed

return by rto . Is rto serially correlated? If yes, calculate the ¬rst three lags of auto-

correlations of rto .

2. Let Pt be the observed market price of an asset, which is related to the fundamen-

—

tal value of the asset Pt— via Eq. (5.9). Assume that Pt— = Pt— ’ Pt’1 forms

a Gaussian white noise series with mean zero and variance 1.0. Suppose that the

bid-ask spread is two ticks. What is the lag-1 autocorrelation of the price change

series Pt = Pt ’ Pt’1 when the tick size is $1/8? What is the lag-1 autocorre-

lation of the price change when the tick size is $1/16?

3. The ¬le “ibm-d2-dur.dat” contains the adjusted durations between trades of IBM

stock on November 2, 1990. The ¬le has three columns consisting of day, time of

trade measured in seconds from midnight, and adjusted durations.

(a) Build an EACD model for the adjusted duration and check the ¬tted model.

(b) Build a WACD model for the adjusted duration and check the ¬tted model.

(c) Build a GACD model for the adjusted duration and check the ¬tted model.

(d) Compare the prior three duration models.

4. The ¬le “mmm9912-dtp.dat” contains the transactions data of the stock of 3M

Company in December 1999. There are three columns: day of the month, time

of transaction in seconds from midnight, and transaction price. Transactions that

occurred after 4:00 pm Eastern time are excluded.

(a) Is there a diurnal pattern in 3M stock trading? You may construct a time series

n t , which denotes the number of trades in 5-minute time interval to answer

this question.

(b) Use the price series to con¬rm the existence of bid-ask bounce in intraday

trading of 3M stock.

(c) Tabulate the frequencies of price change in multiples of tick size $1/16. You

may combine changes with 5 ticks or more into a category and those with ’5

ticks or beyond into another category.

5. Consider again the transactions data of 3M stock in December 1999.

219

REFERENCES

(a) Use the data to construct an intraday 5-minute log return series. Use the sim-

ple average of all transaction prices within a 5-minute interval as the stock

price for the interval. Is the series serially correlated? You may use Ljung“

Box statistics to test the hypothesis with the ¬rst 10 lags of sample autocor-

relation function.

(b) There are seventy-seven 5-minute returns in a normal trading day. Some

researchers suggest that the sum of squares of the intraday 5-minute returns

can be used as a measure of daily volatility. Apply this approach and calculate

the daily volatility of the log return of 3M stock in December 1999. Discuss

the validity of such a procedure to estimate daily volatility.

6. The ¬le “mmm9912-adur.dat” contains an adjusted intraday trading duration of

3M stock in December 1999. There are thirty-nine 10-minute time intervals in

a trading day. Let di be the average of all log durations for the ith 10-minute

interval across all trading days in December 1999. De¬ne an adjusted duration as

t j / exp(di ), where j is in the ith 10-minute interval. Note that more sophisticated

methods can be used to adjust the diurnal pattern of trading duration. Here we

simply use a local average.

(a) Is there a diurnal pattern in the adjusted duration series? Why?

(b) Build a duration model for the adjusted series using exponential innovations.

Check the ¬tted model.

(c) Build a duration model for the adjusted series using Weibull innovations.

Check the ¬tted model.

(d) Build a duration model for the adjusted series using generalized Gamma

innovations. Check the ¬tted model.

(e) Compare and comment on the three duration models built before.

REFERENCES

Campbell, J. Y., Lo, A. W., and MacKinlay, A. C. (1997), The Econometrics of Financial

Markets, Princeton University Press: New Jersey.

Cho, D., Russell, J. R., Tiao, G. C., and Tsay, R. S. (2000), “The magnet effect of price limits:

Evidence from high frequency data on Taiwan stock exchange,” Working paper, Graduate

School of Business, University of Chicago.

Engle, R. F., and Russell, J. R. (1998), “Autoregressive conditional duration: A new model for

irregularly spaced transaction data,” Econometrica, 66, 1127“1162.

Ghysels, E. (2000), “Some econometric recipes for high-frequency data cooking,” Journal of

Business and Economic Statistics, 18, 154“163.

Hasbrouck, J. (1992), Using the TORQ database, Stern School of Business, New York Uni-

versity.

Hasbrouck, J. (1999), “The dynamics of discrete bid and ask quotes,” Journal of Finance, 54,

2109“2142.

Hauseman, J., Lo, A., and MacKinlay, C. (1992), “An ordered probit analysis of transaction

stock prices,” Journal of Financial Economics, 31, 319“379.

220 HIGH-FREQUENCY DATA

Lo, A., and MacKinlay, A. C. (1990), “An econometric analysis of nonsynchronous trading,”

Journal of Econometrics, 45, 181“212.

McCulloch, R. E., and Tsay, R. S. (2000), “Nonlinearity in high frequency data and hierarchi-

cal models,” Working paper, Graduate School of Business, University of Chicago.

Roll, R. (1984), “A simple implicit measure of the effective bid-ask spread in an ef¬cient

market,” Journal of Finance, 39, 1127“1140.

Rydberg, T. H., and Shephard, N. (1998), “Dynamics of trade-by-trade price movements:

decomposition and models,” Working paper, Nuf¬eld College, Oxford University.

Stoll, H., and Whaley, R. (1990), “Stock market structure and volatility,” Review of Financial

Studies, 3, 37“71.

Wood, R. A. (2000), “Market microstructure research databases: History and projections,”

Journal of Business & Economic Statistics, 18, 140“145.

Zhang, M. Y., Russell, J. R., and Tsay, R. S. (2001), “A nonlinear autoregressive conditional

duration model with applications to ¬nancial transaction data,” Journal of Econometrics

(to appear).

Zhang, M. Y., Russell, J. R., and Tsay, R. S. (2001b), “Determinants of bid and ask quotes

and implications for the cost of trading,” Working paper, Graduate School of Business,

University of Chicago.

6

Switching Regime Volatility:

An Empirical Evaluation

BRUNO B. ROCHE AND MICHAEL ROCKINGER

ABSTRACT

Markov switching models are one possible method to account for volatility clustering.

This chapter aims at describing, in a pedagogical fashion, how to estimate a univariate

switching model for daily foreign exchange returns which are assumed to be drawn

in a Markovian way from alternative Gaussian distributions with different means and

variances. An application shows that the US dollar/Deutsche Mark exchange rate can be

modelled as a mixture of normal distributions with changes in volatility, but not in mean,

where regimes with high and low volatility alternate. The usefulness of this methodology

is demonstrated in a real life application, i.e. through the performance comparison of

simple hedging strategies.

6.1 INTRODUCTION

Volatility clustering is a well known and well documented feature of ¬nancial markets

rates of return. The seminal approach proposed by Engle (1982), with the ARCH model,

followed several years later by Bollerslev (1986), with the GARCH models, led to a huge

literature on this subject in the last decade. This very successful approach assumes that

volatility changes over time in an autoregressive fashion. There are several excellent books

and surveys dealing with this subject. To quote a few, Bollerslev et al. (1992, 1993), Bera

and Higgins (1993), Engle (1995) and Gouri´ roux (1997) provide a large overview of the

e

theoretical developments, the generalisation of the models and the application to speci¬c

markets. ARCH models provide a parsimonious description for volatility clustering where

volatility is assumed to be a deterministic function of past observations.

However, ARCH models struggle to account for the stylised fact that volatility can

exhibit discrete, abrupt and somehow fairly persistent changes. In the late 1980s, Hamil-

ton (1989) proposed an alternative methodology, the Markovian switching model, which

encountered great success. Although initiated by Quandt (1958) and Goldfeld and Quandt

(1973, 1975) to provide a description of markets in disequilibrium, this approach has

not encountered a great interest until the works of Hamilton (1989) on business cycles

modelling, and of Engel and Hamilton (1990) on exchange rates. The main feature of

this approach is that it involves multiple structures and allows returns to be drawn from

distinct distributions.

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m

±

™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5

194 Applied Quantitative Methods for Trading and Investment

The change of regime between the distributions is determined in a Markovian manner.

It is driven by an unobservable state variable that follows a ¬rst-order Markov chain which

can take values of {0, 1}. The value of that variable is dependent upon its past values.

The switching mechanism thus enables complex dynamic structures to be captured and

allows for frequent changes at random times. In that way, a structure may persist for a

period of time and then be replaced by another structure after a switch occurs.

This methodology is nowadays very popular in the ¬eld of nonlinear time series models

and it has experienced a wide number of applications in the analysis of ¬nancial time

series. Although the original Markov switching models focused on the modelling of the

¬rst moment with application to economic and ¬nancial time series, see e.g. Hamilton

(1988, 1989), Engel and Hamilton (1990), Lam (1990), Goodwin (1993), Engel (1994),

Kim and Nelson (1998), among others, a growing body of literature is developing with

regard to the application of this technique and its variant to volatility modelling. To

quote again a few among others, Hamilton and Lin (1996), Dueker (1997) and Ramchand

and Susmel (1998). Gray (1996) models switches in interest rates. Chesnay and Jondeau

(2001) model switches of multivariate dependency.

In this chapter we present the switching methodology in a pedagogical framework

and in a way that may be useful for the ¬nancial empiricist. In Section 6.2 the nota-

tions and the switching model are introduced. In Section 6.3 we develop the maximum

likelihood estimation methodology and show how the switching model can be estimated.

This methodology is applied in Section 6.4 to the US dollar (USD)/Deutsche Mark (DEM)

exchange rate for the period 1 March 1995 to 1 March 1999. In that section it is shown how

estimation results are to be interpreted, and how endogenously detected changes between

states can improve the performances of simple real life hedging strategies. Section 6.5

concludes and hints at further lines of research.

6.2 THE MODEL

We assume, in this chapter, that foreign exchange returns1 are a mixture of normal dis-

tributions. This means that returns are drawn from a normal distribution where the mean

and variance can take different values depending on the “state” a given return belongs to.

Since there is pervasive evidence that variance is persistent, yet little is known about its

mean, it is useful to consider the more restrictive mixture model where just the variance

can switch. This leads us to introduce the following model, based on Hamilton (1994):

Rt = µ + [σ1 St + σ0 (1 ’ St )]µt

where µt are independent and identically distributed normal innovations with mean 0 and

variance 1. St is a Markov chain with values 0 and 1, and with transition probabilities

p = [p00 , p01 , p10 , p11 ] such that:

Pr[St = 1|St’1 = 1] = p11 Pr[St = 0|St’1 = 1] = p01

Pr[St = 1|St’1 = 0] = p10 Pr[St = 0|St’1 = 0] = p00

where p11 + p01 = 1 and p10 + p00 = 1.

1

Returns are calculated as the difference in the natural logarithm of the exchange rate value St for two

consecutive observations: Rt = 100[ln(St ) ’ ln(St’1 )]. This corresponds to continuously compounded returns.

Switching Regime Volatility 195

Let

ρj = Pr[S1 = j ] ∀j

be the unconditional probability of being in a certain state at time 1 and let ρ = [ρ0 , ρ1 ] .

If » designates the vector of all remaining parameters » = [µ, σ1, σ0 ] then we can

de¬ne θ = [» , p , ρ ] the vector of all parameters.2

In the following we use the notation

R t = [Rt , Rt’1, . . . , R1 ]

to designate the vector of realisations of past returns.

It is also useful to introduce the density of Rt conditional on regime St : f (Rt |St ). For

the model considered, in the case where µt is normally distributed, this density can be

written as:3

2

Rt ’ µ

1 1 1

f (Rt |St ; θ ) = √ exp ’ (6.1)

2π σ1 St + σ0 (1 ’ St ) σ1 St + σ0 (1 ’ St )

2

This illustrates that for a given parameter vector θ and for a given state St , the den-

sity of returns can be written in a straightforward manner. Expression (6.1) shows that

the conditional density depends only on the current regime St and not on past ones. It

should also be noted that, due to the Markovian character of the regimes, the information

contained in R t’1 is summarised in St .

6.3 MAXIMUM LIKELIHOOD ESTIMATION

The likelihood is

L = f (R T ; θ ) = f (RT |R T ’1 ; θ )f (RT ’1 |R T ’2 ; θ ) · · · f (R2 |R 1 ; θ )f (R1 ; θ ) (6.2)

and we wish to obtain the maximum likelihood estimate

θ ∈ arg max(θ ) ln f (RT ; θ )

In order to apply a maximum likelihood procedure on (6.2), it is necessary to introduce

the states St so that expression (6.1) can be used. To see how this can be done suppose

that θ is given and consider a typical element of the likelihood which can be developed

by using obvious probabilistic rules:

f (R t ; θ )

f (Rt |R t’1 ; θ ) ≡

f (R t’1 ; θ )

1

f (R t , St ; θ )

St=0

f (Rt |R t’1 ; θ ) =

f (R t’1 ; θ )

2

Notice that there is a link between ρ and p, as we will see later on.

3

Notice that densities (associated with continuous random variables) are written as f (·) and probabilities

(associated with discrete random variables) as Pr[·, ·].

196 Applied Quantitative Methods for Trading and Investment

1

f (Rt |R t’1 , St ; θ )f (R t’1 , St ; θ )

St=0

f (Rt |R t’1 ; θ ) =

f (R t’1 ; θ )

Moreover:

1

f (Rt |R t’1 , St ; θ ) = f (Rt |St ; θ ) Pr[St |Rt’1 ; θ ] (6.3)

St =0

where the last equality follows from (i) the Markovian character of the problem whereby

the knowledge of St summarises the entire history R t’1 so that f (Rt |Rt’1 , St ; θ ) =

f (Rt |St ; θ ) and (ii) f (R t’1 , St ; θ )/f (R t’1 ; θ ) = Pr[St |Rt’1 ; θ ].

We also have:

1

Pr[St , St’1 , R t’1 ; θ ]

Pr[St |R t’1 ; θ ] =

f (R t’1 ; θ )

St’1=0

1

Pr[St |St’1 , R t’1 ; θ ] Pr[St’1 , R t’1 ; θ ]

Pr[St |R t’1 ; θ ] = (6.4)

f (R t’1 ; θ )

St’1=0

1

Pr[St |R t’1 ; θ ] = Pr[St |St’1 ; θ ] Pr[St’1 , R t’1 ; θ ]

St’1=0

The last equality follows from the fact that Pr[St |St’1 , R t’1 ; θ ] = Pr[St |St’1 ; θ ] by the

assumption that states evolve according to a ¬rst-order Markov process.

Using Bayes™ formula it follows that:

Pr[St’1 , R t’1 ; θ ]

Pr[St’1 |R t’1 ; θ ] =

f (R t’1 ; θ )

f (Rt’1 , St’1 , R t’2 ; θ )

Pr[St’1 |R t’1 ; θ ] = 1

f (R t’1 , St’1 ; θ ) (6.5)

St’1 =0

f (Rt’1 |St’1 ; θ ) Pr[St’1 |R t’2 ; θ ]

Pr[St’1 |R t’1 ; θ ] = 1

f (Rt’1 |St’1 ; θ ) Pr[St’1 |R t’2 ; θ ]

St’1 =0

Henceforth, at time t ’ 1, f (Rt’1 |St’1 ; θ ), which is de¬ned in equation (6.1) shows up

in natural fashion. If we assume that we know Pr[St’1 |R t’2 ; θ ] then it becomes possible

using equation (6.5) to compute Pr[St’1 |R t’1 ; θ ] and from equation (6.4) to derive the

conditional probability of St given R t’1 . Pr[St |R t’1 ; θ ] can therefore be computed, for

all t, in a recursive fashion.

The starting value for the probabilities Pr[S1 = j |R0 ; θ ] = Pr[S1 = j ; θ ] = ρj can be

either estimated directly as additional parameters in the maximum likelihood estimation,

Switching Regime Volatility 197

or approximated by the steady state probabilities which have to verify

1

Pr[S1 = i; θ] = Pr[S1 = j ; θ]pij

j =0

1 ’ p00 1 ’ p11

’ Pr[St = 1; θ ] = Pr[St = 0; θ ] =

and (6.6)

2 ’ p11 ’ p00 2 ’ p11 ’ p00

One realises, at this stage, that the likelihood for a given θ can be obtained by iterating

on equation (6.3) which involves the computation of (6.4). As a by-product, the compu-

tation of (6.4) involves (6.5) which are the ¬ltered probabilities of being in a given state

conditional on all currently available information Pr[St |Rt ; θ ]. Also forecasts of states can

be easily obtained by iterating on the transition probabilities.

Using standard numerical methods, this procedure allows for a fast computation of

the estimates.

6.4 AN APPLICATION TO FOREIGN EXCHANGE RATES

Before we develop in detail one application of the switching model to the foreign exchange

markets, we wish to start this section with a brief overview of the functioning of these

markets as they offer notable features.

6.4.1 Features of the foreign exchange interbank market

It is interesting to note that, in contrast to other exchange markets, the interbank foreign

exchange (also called forex) market has no geographical limitations, since currencies are

traded all over the world, and there is no trading-hours scheme, indeed currencies are

traded around the clock. It is, truly, a 24 hours, 7 days-a-week market.

Another notable feature is that, in contrast to other exchange markets too, forex traders

negotiate deals and agree transactions over the telephone with trading prices and vol-

umes not being known to third parties. The tick quotes are provided by market-makers

and conveyed to the data subscribers™ terminal. They are meant to be indicative, pro-

viding a general indication of where an exchange rate stands at a given time. Though

not necessarily representing the actual rate at which transactions really take place, these

indicative quotes are felt as being fairly accurate and matching the true prices experienced

in the market. Moreover, in order to avoid dealing with the bid“ask bounce, inherent to

most high-frequency data (see, for instance, chapter 3 in Campbell et al. (1997)), use was

made, for the estimation of the switching model, of the bid series only, generally regarded

as a more consistent set of observations.

In the following we will use, as an illustration, the USD/DEM exchange rate. The tick-

by-tick quotes have been supplied by Reuters via Olsen & Associates. We will use daily

quotes which are arbitrarily taken at each working day at 10pm GMT (corresponding

approximately to the closing of Northern American markets). We obviously could have

used a different time of the day and/or different frequencies.

It is interesting to note that, in this high-frequency dataset, there are signi¬cant intraday,

intraweek and intrayear seasonal patterns (see Figures 6.1 and 6.2), explained respectively

by the time zone effect (in the case of the USD/DEM rate, the European and the US time

198 Applied Quantitative Methods for Trading and Investment

900

800

700

600

500

400

300

200

100

0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Hour (GMT)

Figure 6.1 Intraday pattern USD/DEM (average number of transactions per hour)

8000

7000

6000

5000

4000

3000

2000

1000

0

Sun Mon Tue Wed Thu Fri

Figure 6.2 Intraweek pattern USD/DEM (average number of transactions per day of the week)

zones are the most active ones), the low activity exhibited during weekends and some

universal public holidays (e.g. Christmas, New Year). Some other factors such as the

release of economic indicators by, amongst others, central banks may also induce sea-

sonality in foreign exchange markets. Seasonalities are also investigated by Guillaume

et al. (1997).

Switching Regime Volatility 199

Further descriptions of questions related to intraday data in forex markets can be found

in Baillie and Bollerslev (1989), Goodhart and Figliuoli (1991), M¨ ller et al. (1997) and

u

Schnidrig and W¨ rtz (1995), among others.

u

6.4.2 Descriptive statistics

In our empirical application of switching models, as previously said, we will use daily

observations of the USD/DEM exchange rate. We obtain these by sampling from the

tick-by-tick data, extracting those recorded at 10pm GMT, from October 1995 to October

1998, corresponding to 775 observations overall.

Table 6.1 displays the basic descriptive statistics for the USD/DEM foreign exchange

returns, for the tick-by-tick and daily data.

There is an enormous difference between the ¬rst two moments of the series, con¬rm-

ing the dramatic effect of time aggregation (see Ghysels et al. (1998)). As indicated in

Table 6.1, daily returns data is negatively skewed, yet in a non-signi¬cant way. This

suggests that there exist some strong negative values but not enough to be statisti-

cally meaningful.

The series exhibits leptokurtosis (which means that the distribution of the data has

thicker tails than a normal one) and heteroskedasticity (or volatility clustering) as shown

by the Ljung“Box test on the squared returns. This latter observation is a well known

feature of ¬nancial rates of return: large price changes in magnitude, irrespective of sign,

are likely to be followed by large price movements; small price changes in magnitude are

likely to be followed by small price movements. Finally, the assumption of normality of

the data can be rejected at any level of signi¬cance as indicated by the Jarque“Bera and

Kolmogorov“Smirnov tests.

It is well known from the literature on mixture of distributions that a mixture of normal

distributions can be leptokurtic. This suggests that daily USD/DEM exchange rate returns

are good candidates for being explained by switching among distributions.

Table 6.1 Descriptive statistics of the daily and tick-by-tick returns

No. of Mean Variance Skewness Kurtosis Acf(1) Acf(2)

observations

’0.09 ’46.3%

Tick-by-tick 5 586 417 2.8E-08 4.17E-08 13.34 0.4%

’0.23

Daily 775 2.0E-04 2.84E-05 4.17

Ljung“Box (20 lags) critical value at 5% = 31.4

Daily returns 25.33

43.82—

Squared daily returns

Normality tests of daily returns

58.68—

Jarque“Bera

0.0607—

Kolmogorov“Smirnov

—

Denotes parameter estimates statistically signi¬cant at the 1% level.

200 Applied Quantitative Methods for Trading and Investment

6.4.3 Model empirical results

We obtain the estimates for the Markov switching model in a recursive way via maximum

likelihood, under normality for the errors and supposing two volatility regimes.

We are using the Gauss software for estimating the model variables. The program is

made up of six parts which are fully described in the Appendix. While the ¬rst three

sections deal with data loading, preparation and the inclusion of the relevant libraries,

sections four and ¬ve compute the maximum likelihood estimation of the model™s par-

ameters. The software proposes several optimisation algorithms. We choose the algorithm

proposed by Berndt, Hall, Hall and Hausman (BHHH). The last section computes the

¬ltered probabilities as described in equation (6.5) and the smoothed probabilities (i.e. the

probabilities of being in a given state conditional on all currently available information

at t ’ 1: Pr[St |Rt’1 ; θ ]). Full details of the program are given in Appendix A.

Table 6.2 shows the estimates for the USD/DEM model. All coef¬cients are signi¬cant

at the 5% level. The probability of staying in the higher volatility regime (i.e. St = 0) is

0.8049, which means that, on average, it lasts for about ¬ve days (1/(1 ’ 0.8049) = 5.13;

see also Hamilton (1989)).

6.4.4 Model evaluation strategy

Model evaluation is carried out in two ways. Firstly, we test whether the model residuals

are normal and non-correlated; we also test if standardised returns follow a normal dis-

tribution. This approach provides a common ground for statistically assessing the model

performance. Our evaluation criterion consists in a thorough analysis of the residuals (i.e.

the analysis and testing of the normality assumptions). For the latter, to make things eas-

ily reproducible, we have used the two common Jarque“Bera and Kolmogorov“Smirnov

tests. Secondly, our evaluation also comprises checking the switching volatility model

through its performance in a close to real life hedging strategy.

6.4.5 Residuals analysis

We carry out a brief analysis of the residuals of the computed model. Strictly speaking,

the term “residuals” is used here for the series of standardised returns (i.e. the returns

series divided by the forecast volatilities). If the volatility captures well the ¬‚uctuations

of the market, and the model™s assumptions are valid, such residuals are expected to

be normal.

Table 6.2 Markov switching model: empirical results

t-Statistics

Value Std. error Pr(>t)

µ 0.0345 0.0171 2.023 0.0215

σ0 0.6351 0.0281 22.603 0.0000

σ1 0.2486 0.0313 7.934 0.0000

p00 0.8049 0.0620 12.976 0.0000

p11 0.6348 0.0927 6.848 0.0000

Switching Regime Volatility 201

Table 6.3 Model residuals “ basic statistics and normality tests

Daily returns Model residuals

Mean 0.0043 0.0485

Std. dev. 0.9944 1.0804

’0.2455 ’0.1344

Skewness

Exc. kurtosis 1.2764 1.2228

Sample size 775

Ljung“Box (20 lags) critical value at 5% = 31.4

Std. residuals 25.33 23.6

43.82—

Squared std. residuals 16.9

Normality tests

58.68— 49.18—

Jarque“Bera

0.061— 0.057—

Kolmogorov“Smirnov

—

Denotes parameter estimates statistically signi¬cant at the 1% level.

4

2

Switching residuals

0

’2

’4

’3 ’2 ’1 0 1 2 3

Quantiles of standard normal

Figure 6.3 Probability plot of the Markov switching model

Table 6.3 presents the basic summary statistics and normality tests for the standardised

log-returns and the standardised residuals/returns computed from the model. Figure 6.3

shows the normal score plot for the standardised returns from the model.

Here the normal score plot is used to assess whether the standardised residuals data

have a Gaussian distribution. If that is the case, then the plot will be approximately a

202 Applied Quantitative Methods for Trading and Investment

straight line. The extreme points have more variability than points towards the centre. A

plot that is bent down on the left and bent up on the right means that the data have longer

tails than the Gaussian.

The striking feature is that the model captures fairly well the heteroskedasticity of the

underlying time series (as shown by the Ljung“Box test on the squared residuals) and,

therefore, achieves homoskedasticity.

Having said that, the switching model residuals do not follow a normal distribution.

Both the Jarque“Bera and the Kolmogorov“Smirnov normality tests enable us to reject

the hypothesis that the residuals follow a normal distribution. Although this does not

invalidate the switching model, this highlights the fact that nonlinearities still exist in the

residuals that the switching model did not manage to capture.

6.4.6 Model evaluation with a simple hedging strategy

In this section we show how ¬ltered volatility estimates can be combined with technical

trend-following systems to improve the performance of these systems.

The negative relationship between the performance of trend-following trading systems

and the level of volatility in foreign exchange markets is a well known empirical ¬nding.

In other words, trending periods in the forex markets tend to occur in relatively quiet (i.e.

low volatility) periods.

We here compare hedging strategies using trend-following systems with similar systems

combined with Markov switching ¬ltered volatility.

6.4.6.1 Trend-following moving average models

As described by M¨ ller (1995), trend-following systems based on moving average models

u

are well known technical solutions, easy to use, and widely applied for actively hedging

foreign exchange rates.

The moving average (MA) is a useful tool to summarise the past behaviour of a

time series at any given point in time. In the following example, MAs are used in

the form of momenta, that is the difference of the current time series values and an

MA. MAs can be de¬ned with different weighting functions of their summation. The

choice of the weighting function has a key in¬‚uence on the success of the MA in its

application.

Among the MAs, the exponentially weighted moving average (EMA) plays an impor-

tant role. Its weighting function declines exponentially with the time distance of the

past observations from now. The sequential computation of EMAs along a time series

is simple as it relies upon a recursion formula. For time series with a strong random

element, however, the rapidly increasing shape of the exponential function leads to strong

weights of the very recent past and hence for short-term noise structures of the time

series. This is a reason why other MA weighting functions have been found worthy of

interest in empirical applications. The following subsection presents two families of MA

weighting functions. Both families can be developed with repeated applications of the

MA/EMA operator.

Switching Regime Volatility 203

6.4.6.2 Moving average de¬nitions

A moving average of the time series x is a weighted average of the series of elements of

the past up to now:

n

wn’j xj

j =’∞

MAx,w;n ≡ (6.7)

n

wn’j

j =’∞

where wk is a series of weights independent of n.

A fundamental property of a moving average is its range r (or centre of gravity of the

weighted function wk ):

∞

wk k

k=0

r≡ (6.8)

∞

wk

k=0

The range r of a discrete time series is in units of the time series index, but unlike this

integer index it can be any positive real number.

EMAs have the following declining weights:

k

r

wk ≡ (6.9)

r +1

where r is the centre of gravity. In the case of daily time series, r = (d ’ 1)/2 (d =

number of days in moving average).

An EMA can be computed by a recursion formula. If its value EMAx (r, tn’1 ) of the

previous series element xn’1 is known, one can easily compute the value at tn :

r

EMAx (r, tn ) = µEMAx (r, tn’1 ) + (1 ’ µ)xn µ= (6.10)

with

r +1

or expressed in number of days in the moving average:

d ’1

2

µ=1’ i.e. r = (6.11)

d +1 2

The recursion needs an initial value to start with. There is usually no information before

the ¬rst series element x1 , which is the natural choice for this initialisation:

EMAx (r, t1 ) = x1 (6.12)

The error made with this initialisation declines with the factor [r/(r + 1)]n’1 .

204 Applied Quantitative Methods for Trading and Investment

In many applications, the EMA will neither be used at t1 nor in the initial phase after

t1 which is called the built-up time. After the built-up time, when the EMA is used, one

expects to be (almost) free of initialisation errors.

6.4.6.3 Trading rules with EMA trading models

MA models summarise the past behaviour of a time series at any given point in time.

This information is used to identify trends in ¬nancial markets in order to subsequently

take positions according to the following rule:

If x(tn ) > EMAx (r, tn ) go long (or stay long)

If x(tn ) < EMAx (r, tn ) go short (or stay short)

with x(tn ) the spot exchange rate at time t. Commission costs of 0.00025 DEM are

charged for each trade. A trade is de¬ned as being long and going short or vice versa.

An EMA model is de¬ned by its type (i.e. MA or EMA) and, in the latter case, its

centre of gravity (or number of MA days). There are obviously an in¬nite number of

different combinations. Our analysis here is arbitrarily limited to one EMA type (i.e. 50

days or µ = 0.96).

6.4.6.4 Trading rules of EMA trading models with volatility ¬lters

It is a well known empirical ¬nding that trend-following systems tend to perform poorly

when markets become volatile. One possible explanation lies in the fact that, most of the

time, high volatility periods correlate with periods when prices change direction.

The previous rules are thus combined with the following new rules:

If volatility > volatility threshold T , then reverse position

(i.e. go long if current position is short and vice versa)

If volatility < volatility threshold T , keep position as indicated

by the underlying MA model

6.4.6.5 Trading results

Table 6.4 shows the numerical results. Strategies based on the EMA model without volatil-

ity ¬ltering are leading to approximately zero pro¬t (if commission costs are not included),

denoting the inability of the EMA model considered here to detect pro¬table trends. The

volatility ¬lter trading strategy based on ¬ltered probabilities on the other hand is improv-

ing considerably the performance of the model. It is interesting to note that, despite the

higher frequency of the trades (253 vs. 65), the performance after transaction costs is still

signi¬cantly higher. Lastly, in terms of the risk/reward ratio, the latter model achieves a

remarkable performance, although the average pro¬t per annum is average.

The ¬le “Hedging System Simulation.xls” on the CD-Rom contains the raw forex data

as well as the Markovian switching volatility and the moving average computations. The

¬le is split into three worksheets.

Switching Regime Volatility 205

Table 6.4 Summary results of trading strategies (including transaction costs)

Without volatility ¬lter With volatility ¬lter

’4.9%

Total pro¬t (%) 15.4%

’1.6%

Average pro¬t per annum 5.1%

Maximum drawdowna 19.4% 10.4%

Number of trades 65 253

’0.08

Risk/reward ratiob 0.49

a

Drawdowns are de¬ned as the difference between the maximum pro¬t potentially realised

to date and the pro¬t (or loss) realised to date. The maximum drawdown is therefore the

maximum value of the drawdowns observed in history. It is an estimate of the maximum

loss the strategy would have incurred.

b

De¬ned as the ratio maximum drawdown/average pro¬t per annum.

The ¬rst worksheet (“USDDEM EMA System”) contains the basic computation of the

EMA model and the trading model simulation. Columns A and B contain the date and

the mid forex rate respectively. Column C contains the EMA computation. The EMA

centre of gravity is input in cell C3. Column E contains the switching volatility ¬gures.

The computation of these ¬gures is given in the third worksheet (“Backup”, column N);

the formula is based on the ¬ltered probabilities, i.e. Pr[St = 0|Rt ]σ0 + Pr[St = 1|Rt ]σ1 .