Conclusions

7.13 Alternative Models

There are as many models proposed for ļ¬nancial data as there are creative

people working in the area (i.e. lots). Some ļ¬nd more support in speciļ¬c

communities, and the debate about which are the most appropriate, shows no

sign of early resolution. For example the use of Neural Nets have emerged from

the ļ¬eld of artiļ¬cial intelligence and provides a locally simple and compelling

model originally suggested as model of the brain.

7.13.1 Neural Nets

A basic premise of much of modern research is that many otherwise extremely

complex phenomena are much simpler when viewed locally. On this local scale,

structures and organizations are substantially simpler. Complex societies of in-

sects, for example, are organized with very simple interactions. Even diļ¬erential

equations like

dy

= y 2 (1 ā’ y)

dx

describe a simple local structure of a function (its slope is proportional to the

square of the distance from zero times the distance from one) but the solution

is diļ¬cult to write in closed form.

Neural Nets are suggested as devices for processing information as it passes

through a network based loosely on the parallel architecture of animal brains.

They are a form of multiprocessor computer system, with individual simple

processing elements which are interconnected to a high degree. At a given node

binary bits b1 , b2 , b3 enter and then are processed with a very simple processor

g(b1 , b2 , b3 ) (often a weighted average of the inputs, possibly transformed). This

transformed output is then transmitted to one of more nodes.

Thus a particular neural net model consists of a description of the proces-

sors (usually simple functions of weighted averages), an architecture describing

the routing, and a procedure for estimating the parameters ( for example the

weights in the weighted average). They have the advantage of generality and

427

428 OTHER METHODS AND CONCLUSIONS

ļ¬‚exibility- they can probably be modiļ¬ed to handle nearly any problem with

some success. However, in speciļ¬c models for which there are statistically moti-

vated alternatives, they usually perform less well than a method designed for a

statistical model. Their generality and ļ¬‚exibility makes them a popular research

topic in ļ¬nance see for example Trippi and Turban (1996).

7.13.2 Chaos, Long term dependence and non-linear Dy-

namics

Another topic, popularized in ļ¬nance by books by Peters (1996) and Gliek

(1987), is chaos. Chaotic systems are generally purely deterministic systems that

may resemble random or stochastic ones. For example if we deļ¬ne a sequence by

a recursion of the form xt = f (xtā’1 ) (this is the same form as the recursion that

the linear congruential random number generator satisļ¬es) for some non-linear

function f , the resulting system may have many of the apparent properties of

a random sequence. Depending on the nature of the function f , the sequence

may or may not appear āchaoticā. Compare for example the behaviour of the

above recursion when f (x) = ax(1 ā’ x), 0 < x < 1, , a Ā· 4 for diļ¬erent initial

conditions and diļ¬erent values of a. When a = 4, this recursion is extremely

sensitive to the initial condition, as Figure 7.20 shows. In the left panel we plot

the values of xn against n for a = 4 and x0 = 0.4999 and in the right panel,

for x0 = 0.5. This small change in the initial condition makes an enormous

diļ¬erence to the sequence xn which converges almost immediately to zero when

x0 = 0.5 but when x0 = 0.4999 behaves much more like a random sequence

except with higher density near 0 and 1. This strong dependence on the distant

past is typical of a chaotic system.

Similarly, the recursion

xt = 1 ā’ ax2 + bxtā’2 , a = 1.4, b = 0.3

tā’1

describes a bivariate chaotic system, which, like an autoregressive process of

order 2, requires two āseedsā to determine the subsequent elements of the se-

quence. In general, a system might deļ¬ne xt as a non-linear function of n pre-

decessors. Detecting chaos( or lack therof) is equivalent to determining whether

the sequences (xt , xt+1 , . . . xt+n ), t = 1, 2, ļ¬ll n + 1 dimensional space.

Tests designed to test whether a given sequence of stock returns are inde-

pendent identically distributed generally result in rejecting this hypothesis but

the most plausible explanation of this is not so clear. For example Hsieh (1991)

tests for both chaotic behaviour and for arch-garch eļ¬ects (predictable vari-

ance changes) and concludes that the latter is the most likely cause of apparent

dependence in the data.

7.13.3 ARCH AND GARCH

There are many failures in the Black-Scholes model for stock returns but two

extremely obvious ones common to the application of simple Gaussian time se-

ries models to much ļ¬nancial data, evident for at least the past 40 years. The

7.13. ALTERNATIVE MODELS 429

x0=0.4999 x0=0.5000

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

n

n

0.5 0.5

x

x

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

n n

Figure 7.20: The eļ¬ect of the initial condition on the recursion xn+1 = 4xn (1 ā’

xn ). Left panel, x0 = 0.4999, Right panel, x0 = 0.5.

ļ¬rst is the heavy tails in the distribution of returns. There are many days in

which the increase or decrease in a stock price, for example, is well beyond the

range of anything reasonable for a normal random variable. Models such as the

stable laws or the NIG process have been proposed to ameliorate this problem.

However there is another apparent failure in such independent increment mod-

els, the failure to adequately represent extended observed periods of high and

low volatility. The innovations are supposed in the conventional ARMA mod-

els to be independent with 0 mean and constant variance Ļ 2 and the squared

innovations should therefore be approximately independent (uncorrelated) vari-

ates but most series show periods when these squared innovations tend to be

consistently above the median followed by periods when they are consistently

smaller. While there are many ways of addressing this failure in traditional

models, one of the most popular is the use of GARCH, or Generalized Autore-

gressive Conditional Heteroscedasticity (see Bollerslev, 1986, Duan, 1995, Engle

and Rosenberg, 1995).

Traditional time series models attempt to model the expected value of the

series given the past observations, assuming that the conditional variance is

constant, but a GARCH model takes this one moment further, allowing the

conditional variance to also be modeled by a time series. In particular, suppose

that the innovations in a standard time series model, say an ARMA model, are

normally distributed given the past

at ā¼ N (0, ht ).

Assume that ht , the conditional variance given the past, satisļ¬es some ARMA

430 OTHER METHODS AND CONCLUSIONS

relationship, with the squared innovations posing as the new innovations process.

Ī²(B)ht = Ī±0 + Ī±(B)a2

t

where Ī²(B) = 1 ā’ Ī²1 B ā’ . . . Ī²r B r and Ī±(B) = Ī±1 B + . . . + Ī±s B s and B is the

backwards time shift so that B r ht = htā’r .

The case r = 0 is the original ARCH Autoregressive Conditional Heteroscedas-

ticity model, and the most common model takes r = 0, s = 1 so ht = Ī±0 +

Ī±1 a2 . For ARCH and GARCH models the parameters must be estimated

tā’1

using both the models for the conditional mean and the conditional variance

and diagnostics apply to both models. The advantages of these models are that

they provide for some dependence among the observations through volatility

rather than through the mean, and that they tend to have heavier tails. As a

result, they provide larger estimated prices for deep out-of-the-money options,

for example, which are heavily dependent on an accurate model for volatility.

7.13.4 ARCH(1)

The basic model investigated by Engle(1982) was the simplest case in which the

process has zero conditional mean (it is reasonable to expect that arbitrageurs

in the market have removed a large part of any predictability in the mean) but

that the squares are signiļ¬cantly autocorrelated. Most ļ¬nancial data exhibits

this property to some degree. Engleā™s ARCH(1) model is:

xt ā¼ N (0, ht ) and (7.109)

ht = Ī±0 + Ī±1 x2 .

tā’1

An ARCH regression model allows the conditional mean of xt in (7.109) to

depend on some observed predictors. The GARCH-IN-MEAN process ļ¬t by

French et. al.(1987) allow the mean of xt to be a function of its variance so

p/2

that xt ā¼ N (a + bht , ht ). This would allow testing the hypotheses of relative

risk aversion, for example. However, there is little evidence that b may be non-

zero, and even less evidence to determine whether the linear relation should be

between mean and standard deviation (p = 1) or between mean and variance

(p = 2).

7.13.5 Estimating Parameters

The conditional log likelihood to be maximized with respect to the parameters

Ī±i , Ī²j is:

1X a2

Ė

[ln ht + t ]

ln(L) = ā’

2t ht

Various modiļ¬cations of the above GARCH model are possible and have been

tried, but the spirit of the models as well as most of the methodology remains

basically the same. There is also a system of Yule-Walker equations that can

7.13. ALTERNATIVE MODELS 431

be solved for the coeļ¬cients Ī²i in an ARCH model. If Ī³i is the autocovariance

function of the innovations squared a2 process, then

i

s r

X X

Ī³n = Ī±i Ī³nā’i + Ī²i Ī³nā’i

i=1 i=1

for n ā„ r + 1. These provide the usual Partial Autocorrelation Function for

identiļ¬cation of the suitable order r of the autoregressive part.

7.13.6 Akaikeā™s Information Criterion

A model which leads to small estimated variances for the innovations is obviously

preferable to one with highly variable innovations if everything else is the same.

In other words when we select a model, we are inclined to minimize the estimated

P2

residual variance N 1 ai (or equivalently its logarithm) over the parameters

Ė

ā’k

themselves and k, the number of autoregressive+moving average parameters in

the model. Unfortunately each additional parameter results in what may be P2

only a marginal improvement in the residual variance so minimizing N 1 ai

Ė

ā’k

would encourage the addition of parameters which do not improve the ability

of the model to forecast or ļ¬t new observations. A better criterion, the Akaikeā™s

Information Criterion, penalizes the model for each additional parameter:

1 X2 2k

(7.110)

AIC = log[ ai ] + .

Ė

N ā’k N

The AIC criterion chooses that model and number of parameters k which

minimizes this quantity. In some software such as in R and Splus, the AIC

diļ¬ers from (7.110) approximately by a multiple of N, for example AIC2 =

ā’2log(L) + 2 Ć— k is approximately N times the value in (7.110). The advantage

in multiplying by N is that diļ¬erences operate on a more natural scale. When

nested models are compared (i.e. one model is a special case of the other),

diļ¬erences between values of the statistic ā’2log(L) have a distribution which is

Chi-squared with degrees of freedom the diļ¬erence in the number of parameters

in the two models under the null hypothesis that the simpler model holds.

7.13.7 Estimation and testing ARCH Eļ¬ects.

The function ugarch in Matlab estimates the parameters in a Garch model. In

particular, if a is the vector of innovations from a time series for which we wish

to ļ¬t a GARCH model, the command [Alpha0, Alpha, Beta] = ugarch(a, p, q)

ļ¬ts a GARCH(p, q) model.

ht = Ī±0 + Ī±1 htā’1 + ... + Ī±p htā’p + Ī²1 a2 + .. + Ī²q a2

tā’1 tā’q

For example if we ļ¬t the Garch(1,1) model to the mean adjusted daily returns

for the S&P 500 index over the period 1997-2002. The estimated model is

ht = 0.000006 + 0.8671htā’1 + 0.0974a2 .

tā’1

432 OTHER METHODS AND CONCLUSIONS

c

The large coeļ¬cient Ī±1 = 0.8671 on htā’1 indicates a strong tendency for the

variance to remain near its previous value. In R there is a similar function in the

package tseries, (see for example http://pbil.univ-lyon1.fr/library/tseries/html/00Index.html)

run with a command like

garch(a,order=c(p,q),coef=NULL, itmax=200,eps=NULL,grad=c(ānumericalā),series=NULL,trace=TR

where the NULL parameters indicates that the default values are used.

Most of the tests for the adequacy of a given time series model are inherited

from regression, although in some cases the autocorrellation of the series induces

a diļ¬erent limiting distribution. For example, if there is an ARCH or GARCH

eļ¬ect, then there should be a signiļ¬cant regression of a2 on its predecessors

Ėt

2 2 2

atā’1 , atā’2 , atā’3 . . .. Suppose we are able to obtain residuals al , al+1 , . . . aN from

Ė Ė Ė ĖĖ Ė

an ARMA model for the original series. We might test for ARCH eļ¬ect by

regressing the vector (Ė2 , . . . a2 ) on a constant as well as the s āpredictorsā

al+s ĖN

(Ė2

al+sā’1 , , a2 ), (Ė2

ĖNā’1 al+sā’2 , , a2 ā’2 ). . . . (Ė2 , , a2 ā’s )

ĖN al ĖN

and obtaining the usual coeļ¬cient of determination or squared multiple corre-

lation coeļ¬cient R2 . Standardized, (N ā’ l)R2 has an approximate chi-squared

distribution with s degrees of freedom under the null hypothesis of homoscedas-

ticity so values above the 95ā™th chi-squared percentile would lead to rejecting

the homoscedasticity null hypothesis and concluding arch-like eļ¬ects. One can

also ļ¬t a GARCH model and compare the values of the coeļ¬cient estimators

with their standard errors to see if the model can be further simpliļ¬ed. Finally,

it is easy to simulate an ARCH or a GARCH model (see for example the func-

tion [a, h] = ugarchsim(Alpha0 , Alpha , Beta , NumSamples) in Matlab). Any

test statistic which is sensitive to persistence in the volatility can be adapted to

test for a GARCH model by using simulations to determine the distribution of

this test statistic, where we ļ¬x the parameter values in the simulation at their

estimated values.

7.13.8 Example. Canadian dollar, US dollar Exchange.

As an example we downloaded the US/Canadian dollar exchange rate close

for a 10 year period from October 7, 1994 to October 8, 2004 from the Bank

of Canada website http://www.bankofcanada.ca. There are approximately 2514

daily observations of the value of the US dollar priced in Canadian dollars. Sup-

pose we ļ¬rst ļ¬t an autoregressive moving average model to this returns data

of order (1, 1) using the systems identiļ¬cation toolbox in Matlab. The com-

mand armax(data,[ p, q]) ļ¬ts an autoregressive moving average model in gen-

eral, with autoregressive order p and moving average order 1. We ļ¬t an AR(2)

model to the returns from this series resulting in the model xt +0.03657xtā’1

ā’0.02497xtā’2 = at with innovations process at and then we ļ¬t a GARCH(1,1)

7.13. ALTERNATIVE MODELS 433

model to the innovations at with the following estimated model for the variance

ht of at :

ht = 0.9524htā’1 + 0.0474a2 .

tā’1

Once again the large coeļ¬cient 0.954 on htā’1 indicates a high degree of persis-

tence in the volatility.

Diebold and Nerlove (1989), conļ¬rm the ARCH eļ¬ect on the exchange rate

for a number of diļ¬erent currencies, observing ARCH eļ¬ects at lag 12 weeks or

more.

7.13.9 Conclusions

Research and modeling is a dynamic task, but in no discipline more than in

ļ¬nance. In physics theories change over time but at least the target is often

a physical law which is, at least in terms of our meagre lifespans, relatively

constant. Not so in the modeling of a ļ¬nancial time series. First order auto-

correlation in the Dow Jones average was once quite strong, but with increased

liquidity and statistical literacy of arbitrageurs, it has largely disappeared. Tools

which permit ātradingā volatility, interest rates may alter other features of the

market as well. The standard approach to derivative pricing which we take here

is to assume a model for an asset price, in which case the derivative, a function

of the asset price, has a price functionally related to the asset price. Which is the

asset and which is the derivative is semantic (since the derivative may be more

heavily traded than the underlying); highly dependent and liquid assets will

result in a near functional relationship between the corresponding asset prices,

and will tend to ātie downā each to a functional relationship. Each new liquid

ļ¬nancial instrument or asset in a market can substantially eļ¬ect the model for

related assets. Models and their parameters are not only subject to constantly

changing economic conditions, they are eļ¬ected by every related product that

enters and leaves the market, by the information and the technology base of

traders, by political events and moods.

Todayā™s ļ¬nancial model is almost certainly inadequate tomorrow, and the

model parameters are evidently in constant ļ¬‚ux. As the complexity of the

model changes, and as the explosion of new types of instruments in the market

continues to constrain current asset prices in new ways, the need for new sta-

tistical and computational tools, often dependent on computer simulation, can

only continue to grow. Evolution, from which we have ourselves developed, is a

remarkably eļ¬cient stochastic optimization (in this case in a high dimensional

space). The diversity of models, tools, and approaches to ļ¬nancial analysis that

can be accommodated by simulation ensure that our ability to reļ¬‚ect the real

processes will continue to improve. I am conscious that only a fraction of these

tools and models are discussed here, a tribute to the wealth of research in this

important area.