Other Methods and

7.13 Alternative Models
There are as many models proposed for ¬nancial data as there are creative
people working in the area (i.e. lots). Some ¬nd more support in speci¬c
communities, and the debate about which are the most appropriate, shows no
sign of early resolution. For example the use of Neural Nets have emerged from
the ¬eld of arti¬cial intelligence and provides a locally simple and compelling
model originally suggested as model of the brain.

7.13.1 Neural Nets
A basic premise of much of modern research is that many otherwise extremely
complex phenomena are much simpler when viewed locally. On this local scale,
structures and organizations are substantially simpler. Complex societies of in-
sects, for example, are organized with very simple interactions. Even di¬erential
equations like
= y 2 (1 ’ y)
describe a simple local structure of a function (its slope is proportional to the
square of the distance from zero times the distance from one) but the solution
is di¬cult to write in closed form.
Neural Nets are suggested as devices for processing information as it passes
through a network based loosely on the parallel architecture of animal brains.
They are a form of multiprocessor computer system, with individual simple
processing elements which are interconnected to a high degree. At a given node
binary bits b1 , b2 , b3 enter and then are processed with a very simple processor
g(b1 , b2 , b3 ) (often a weighted average of the inputs, possibly transformed). This
transformed output is then transmitted to one of more nodes.
Thus a particular neural net model consists of a description of the proces-
sors (usually simple functions of weighted averages), an architecture describing
the routing, and a procedure for estimating the parameters ( for example the
weights in the weighted average). They have the advantage of generality and


¬‚exibility- they can probably be modi¬ed to handle nearly any problem with
some success. However, in speci¬c models for which there are statistically moti-
vated alternatives, they usually perform less well than a method designed for a
statistical model. Their generality and ¬‚exibility makes them a popular research
topic in ¬nance see for example Trippi and Turban (1996).

7.13.2 Chaos, Long term dependence and non-linear Dy-
Another topic, popularized in ¬nance by books by Peters (1996) and Gliek
(1987), is chaos. Chaotic systems are generally purely deterministic systems that
may resemble random or stochastic ones. For example if we de¬ne a sequence by
a recursion of the form xt = f (xt’1 ) (this is the same form as the recursion that
the linear congruential random number generator satis¬es) for some non-linear
function f , the resulting system may have many of the apparent properties of
a random sequence. Depending on the nature of the function f , the sequence
may or may not appear “chaotic”. Compare for example the behaviour of the
above recursion when f (x) = ax(1 ’ x), 0 < x < 1, , a · 4 for di¬erent initial
conditions and di¬erent values of a. When a = 4, this recursion is extremely
sensitive to the initial condition, as Figure 7.20 shows. In the left panel we plot
the values of xn against n for a = 4 and x0 = 0.4999 and in the right panel,
for x0 = 0.5. This small change in the initial condition makes an enormous
di¬erence to the sequence xn which converges almost immediately to zero when
x0 = 0.5 but when x0 = 0.4999 behaves much more like a random sequence
except with higher density near 0 and 1. This strong dependence on the distant
past is typical of a chaotic system.
Similarly, the recursion
xt = 1 ’ ax2 + bxt’2 , a = 1.4, b = 0.3

describes a bivariate chaotic system, which, like an autoregressive process of
order 2, requires two “seeds” to determine the subsequent elements of the se-
quence. In general, a system might de¬ne xt as a non-linear function of n pre-
decessors. Detecting chaos( or lack therof) is equivalent to determining whether
the sequences (xt , xt+1 , . . . xt+n ), t = 1, 2, ¬ll n + 1 dimensional space.
Tests designed to test whether a given sequence of stock returns are inde-
pendent identically distributed generally result in rejecting this hypothesis but
the most plausible explanation of this is not so clear. For example Hsieh (1991)
tests for both chaotic behaviour and for arch-garch e¬ects (predictable vari-
ance changes) and concludes that the latter is the most likely cause of apparent
dependence in the data.

There are many failures in the Black-Scholes model for stock returns but two
extremely obvious ones common to the application of simple Gaussian time se-
ries models to much ¬nancial data, evident for at least the past 40 years. The

x0=0.4999 x0=0.5000
1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
n n

Figure 7.20: The e¬ect of the initial condition on the recursion xn+1 = 4xn (1 ’
xn ). Left panel, x0 = 0.4999, Right panel, x0 = 0.5.

¬rst is the heavy tails in the distribution of returns. There are many days in
which the increase or decrease in a stock price, for example, is well beyond the
range of anything reasonable for a normal random variable. Models such as the
stable laws or the NIG process have been proposed to ameliorate this problem.
However there is another apparent failure in such independent increment mod-
els, the failure to adequately represent extended observed periods of high and
low volatility. The innovations are supposed in the conventional ARMA mod-
els to be independent with 0 mean and constant variance σ 2 and the squared
innovations should therefore be approximately independent (uncorrelated) vari-
ates but most series show periods when these squared innovations tend to be
consistently above the median followed by periods when they are consistently
smaller. While there are many ways of addressing this failure in traditional
models, one of the most popular is the use of GARCH, or Generalized Autore-
gressive Conditional Heteroscedasticity (see Bollerslev, 1986, Duan, 1995, Engle
and Rosenberg, 1995).
Traditional time series models attempt to model the expected value of the
series given the past observations, assuming that the conditional variance is
constant, but a GARCH model takes this one moment further, allowing the
conditional variance to also be modeled by a time series. In particular, suppose
that the innovations in a standard time series model, say an ARMA model, are
normally distributed given the past

at ∼ N (0, ht ).

Assume that ht , the conditional variance given the past, satis¬es some ARMA

relationship, with the squared innovations posing as the new innovations process.

β(B)ht = ±0 + ±(B)a2

where β(B) = 1 ’ β1 B ’ . . . βr B r and ±(B) = ±1 B + . . . + ±s B s and B is the
backwards time shift so that B r ht = ht’r .
The case r = 0 is the original ARCH Autoregressive Conditional Heteroscedas-
ticity model, and the most common model takes r = 0, s = 1 so ht = ±0 +
±1 a2 . For ARCH and GARCH models the parameters must be estimated
using both the models for the conditional mean and the conditional variance
and diagnostics apply to both models. The advantages of these models are that
they provide for some dependence among the observations through volatility
rather than through the mean, and that they tend to have heavier tails. As a
result, they provide larger estimated prices for deep out-of-the-money options,
for example, which are heavily dependent on an accurate model for volatility.

7.13.4 ARCH(1)
The basic model investigated by Engle(1982) was the simplest case in which the
process has zero conditional mean (it is reasonable to expect that arbitrageurs
in the market have removed a large part of any predictability in the mean) but
that the squares are signi¬cantly autocorrelated. Most ¬nancial data exhibits
this property to some degree. Engle™s ARCH(1) model is:

xt ∼ N (0, ht ) and (7.109)
ht = ±0 + ±1 x2 .

An ARCH regression model allows the conditional mean of xt in (7.109) to
depend on some observed predictors. The GARCH-IN-MEAN process ¬t by
French et. al.(1987) allow the mean of xt to be a function of its variance so
that xt ∼ N (a + bht , ht ). This would allow testing the hypotheses of relative
risk aversion, for example. However, there is little evidence that b may be non-
zero, and even less evidence to determine whether the linear relation should be
between mean and standard deviation (p = 1) or between mean and variance
(p = 2).

7.13.5 Estimating Parameters
The conditional log likelihood to be maximized with respect to the parameters
±i , βj is:

1X a2
[ln ht + t ]
ln(L) = ’
2t ht
Various modi¬cations of the above GARCH model are possible and have been
tried, but the spirit of the models as well as most of the methodology remains
basically the same. There is also a system of Yule-Walker equations that can

be solved for the coe¬cients βi in an ARCH model. If γi is the autocovariance
function of the innovations squared a2 process, then

s r
γn = ±i γn’i + βi γn’i
i=1 i=1

for n ≥ r + 1. These provide the usual Partial Autocorrelation Function for
identi¬cation of the suitable order r of the autoregressive part.

7.13.6 Akaike™s Information Criterion
A model which leads to small estimated variances for the innovations is obviously
preferable to one with highly variable innovations if everything else is the same.
In other words when we select a model, we are inclined to minimize the estimated
residual variance N 1 ai (or equivalently its logarithm) over the parameters
themselves and k, the number of autoregressive+moving average parameters in
the model. Unfortunately each additional parameter results in what may be P2
only a marginal improvement in the residual variance so minimizing N 1 ai
would encourage the addition of parameters which do not improve the ability
of the model to forecast or ¬t new observations. A better criterion, the Akaike™s
Information Criterion, penalizes the model for each additional parameter:
1 X2 2k
AIC = log[ ai ] + .
N ’k N
The AIC criterion chooses that model and number of parameters k which
minimizes this quantity. In some software such as in R and Splus, the AIC
di¬ers from (7.110) approximately by a multiple of N, for example AIC2 =
’2log(L) + 2 — k is approximately N times the value in (7.110). The advantage
in multiplying by N is that di¬erences operate on a more natural scale. When
nested models are compared (i.e. one model is a special case of the other),
di¬erences between values of the statistic ’2log(L) have a distribution which is
Chi-squared with degrees of freedom the di¬erence in the number of parameters
in the two models under the null hypothesis that the simpler model holds.

7.13.7 Estimation and testing ARCH E¬ects.
The function ugarch in Matlab estimates the parameters in a Garch model. In
particular, if a is the vector of innovations from a time series for which we wish
to ¬t a GARCH model, the command [Alpha0, Alpha, Beta] = ugarch(a, p, q)
¬ts a GARCH(p, q) model.

ht = ±0 + ±1 ht’1 + ... + ±p ht’p + β1 a2 + .. + βq a2
t’1 t’q

For example if we ¬t the Garch(1,1) model to the mean adjusted daily returns
for the S&P 500 index over the period 1997-2002. The estimated model is

ht = 0.000006 + 0.8671ht’1 + 0.0974a2 .

The large coe¬cient ±1 = 0.8671 on ht’1 indicates a strong tendency for the
variance to remain near its previous value. In R there is a similar function in the
package tseries, (see for example http://pbil.univ-lyon1.fr/library/tseries/html/00Index.html)
run with a command like

garch(a,order=c(p,q),coef=NULL, itmax=200,eps=NULL,grad=c(”numerical”),series=NULL,trace=TR

where the NULL parameters indicates that the default values are used.
Most of the tests for the adequacy of a given time series model are inherited
from regression, although in some cases the autocorrellation of the series induces
a di¬erent limiting distribution. For example, if there is an ARCH or GARCH
e¬ect, then there should be a signi¬cant regression of a2 on its predecessors
2 2 2
at’1 , at’2 , at’3 . . .. Suppose we are able to obtain residuals al , al+1 , . . . aN from
ˆ ˆ ˆ ˆˆ ˆ
an ARMA model for the original series. We might test for ARCH e¬ect by
regressing the vector (ˆ2 , . . . a2 ) on a constant as well as the s “predictors”
al+s ˆN

al+s’1 , , a2 ), (ˆ2
ˆN’1 al+s’2 , , a2 ’2 ). . . . (ˆ2 , , a2 ’s )
ˆN al ˆN

and obtaining the usual coe¬cient of determination or squared multiple corre-
lation coe¬cient R2 . Standardized, (N ’ l)R2 has an approximate chi-squared
distribution with s degrees of freedom under the null hypothesis of homoscedas-
ticity so values above the 95™th chi-squared percentile would lead to rejecting
the homoscedasticity null hypothesis and concluding arch-like e¬ects. One can
also ¬t a GARCH model and compare the values of the coe¬cient estimators
with their standard errors to see if the model can be further simpli¬ed. Finally,
it is easy to simulate an ARCH or a GARCH model (see for example the func-
tion [a, h] = ugarchsim(Alpha0 , Alpha , Beta , NumSamples) in Matlab). Any
test statistic which is sensitive to persistence in the volatility can be adapted to
test for a GARCH model by using simulations to determine the distribution of
this test statistic, where we ¬x the parameter values in the simulation at their
estimated values.

7.13.8 Example. Canadian dollar, US dollar Exchange.
As an example we downloaded the US/Canadian dollar exchange rate close
for a 10 year period from October 7, 1994 to October 8, 2004 from the Bank
of Canada website http://www.bankofcanada.ca. There are approximately 2514
daily observations of the value of the US dollar priced in Canadian dollars. Sup-
pose we ¬rst ¬t an autoregressive moving average model to this returns data
of order (1, 1) using the systems identi¬cation toolbox in Matlab. The com-
mand armax(data,[ p, q]) ¬ts an autoregressive moving average model in gen-
eral, with autoregressive order p and moving average order 1. We ¬t an AR(2)
model to the returns from this series resulting in the model xt +0.03657xt’1
’0.02497xt’2 = at with innovations process at and then we ¬t a GARCH(1,1)

model to the innovations at with the following estimated model for the variance
ht of at :
ht = 0.9524ht’1 + 0.0474a2 .

Once again the large coe¬cient 0.954 on ht’1 indicates a high degree of persis-
tence in the volatility.
Diebold and Nerlove (1989), con¬rm the ARCH e¬ect on the exchange rate
for a number of di¬erent currencies, observing ARCH e¬ects at lag 12 weeks or

7.13.9 Conclusions
Research and modeling is a dynamic task, but in no discipline more than in
¬nance. In physics theories change over time but at least the target is often
a physical law which is, at least in terms of our meagre lifespans, relatively
constant. Not so in the modeling of a ¬nancial time series. First order auto-
correlation in the Dow Jones average was once quite strong, but with increased
liquidity and statistical literacy of arbitrageurs, it has largely disappeared. Tools
which permit “trading” volatility, interest rates may alter other features of the
market as well. The standard approach to derivative pricing which we take here
is to assume a model for an asset price, in which case the derivative, a function
of the asset price, has a price functionally related to the asset price. Which is the
asset and which is the derivative is semantic (since the derivative may be more
heavily traded than the underlying); highly dependent and liquid assets will
result in a near functional relationship between the corresponding asset prices,
and will tend to “tie down” each to a functional relationship. Each new liquid
¬nancial instrument or asset in a market can substantially e¬ect the model for
related assets. Models and their parameters are not only subject to constantly
changing economic conditions, they are e¬ected by every related product that
enters and leaves the market, by the information and the technology base of
traders, by political events and moods.
Today™s ¬nancial model is almost certainly inadequate tomorrow, and the
model parameters are evidently in constant ¬‚ux. As the complexity of the
model changes, and as the explosion of new types of instruments in the market
continues to constrain current asset prices in new ways, the need for new sta-
tistical and computational tools, often dependent on computer simulation, can
only continue to grow. Evolution, from which we have ourselves developed, is a
remarkably e¬cient stochastic optimization (in this case in a high dimensional
space). The diversity of models, tools, and approaches to ¬nancial analysis that
can be accommodated by simulation ensure that our ability to re¬‚ect the real
processes will continue to improve. I am conscious that only a fraction of these
tools and models are discussed here, a tribute to the wealth of research in this
important area.