<<

. 2
( 19)



>>





Figure 1.9 (1,40) combination moving average Excel spreadsheet (in-sample)


where Yt is the dependent variable at time t; Yt’1 , Yt’2 , . . . , Yt’p are the lagged
dependent variables; φ0 , φ1 , . . . , φp are regression coef¬cients; µt is the residual term;
µt’1 , µt’2 , . . . , µt’p are previous values of the residual; w1 , w2 , . . . , wq are weights.
Several ARMA speci¬cations were tried out, for example ARMA(5,5) and
ARMA(10,10) models were produced to test for any “weekly” effects, which can be
reviewed in the arma.wf1 EViews work¬le. The ARMA(10,10) model was estimated but
was unsatisfactory as several coef¬cients were not even signi¬cant at the 90% con¬dence
interval (equation arma1010). The results of this are presented in Table 1.5. The model
was primarily modi¬ed through testing the signi¬cance of variables via the likelihood
ratio (LR) test for redundant or omitted variables and Ramsey™s RESET test for model
misspeci¬cation.
Once the non-signi¬cant terms are removed all of the coef¬cients of the restricted
ARMA(10,10) model become signi¬cant at the 99% con¬dence interval (equation
arma13610). The overall signi¬cance of the model is tested using the F -test. The null
hypothesis that all coef¬cients except the constant are not signi¬cantly different from zero
is rejected at the 99% con¬dence interval. The results of this are presented in Table 1.6.
Examination of the autocorrelation function of the error terms reveals that the residuals
are random at the 99% con¬dence interval and a further con¬rmation is given by the serial
correlation LM test. The results of this are presented in Tables 1.7 and 1.8. The model
is also tested for general misspeci¬cation via Ramsey™s RESET test. The null hypothesis
of correct speci¬cation is accepted at the 99% con¬dence interval. The results of this are
presented in Table 1.9.
Applications of Advanced Regression Analysis 15
Table 1.5 ARMA(10,10) EUR/USD returns estimation

Dependent Variable: DR’ USEURSP
Method: Least Squares
Sample(adjusted): 12 1459
Included observations: 1448 after adjusting endpoints
Convergence achieved after 20 iterations
White Heteroskedasticity“Consistent Standard Errors & Covariance
Backcast: 2 11

t-Statistic
Variable Coef¬cient Std. error Prob.

’0.000220 ’1.565764
C 0.000140 0.1176
’0.042510 ’0.853645
AR(1) 0.049798 0.3934
’0.210934 ’2.212073
AR(2) 0.095356 0.0271
’0.359378 ’5.820806
AR(3) 0.061740 0.0000
’0.041003 ’0.516264
AR(4) 0.079423 0.6058
AR(5) 0.001376 0.067652 0.020338 0.9838
AR(6) 0.132413 0.054071 2.448866 0.0145
’0.238913 ’4.542616
AR(7) 0.052594 0.0000
AR(8) 0.182816 0.046878 3.899801 0.0001
AR(9) 0.026431 0.060321 0.438169 0.6613
’0.615601 ’8.081867
AR(10) 0.076171 0.0000
MA(1) 0.037787 0.040142 0.941343 0.3467
MA(2) 0.227952 0.095346 2.390785 0.0169
MA(3) 0.341293 0.058345 5.849551 0.0000
MA(4) 0.036997 0.074796 0.494633 0.6209
’0.004544 ’0.076834
MA(5) 0.059140 0.9388
’0.140714 ’3.010598
MA(6) 0.046739 0.0027
MA(7) 0.253016 0.042340 5.975838 0.0000
’0.206445 ’5.151153
MA(8) 0.040077 0.0000
’0.014011 ’0.291661
MA(9) 0.048037 0.7706
MA(10) 0.643684 0.074271 8.666665 0.0000

’0.000225
R-squared 0.016351 Mean dependent var.
Adjusted R-squared 0.002565 S.D. dependent var. 0.005363
’7.606665
S.E. of regression 0.005356 Akaike info. criterion
’7.530121
Sum squared resid. 0.040942 Schwarz criterion
F -statistic
Log likelihood 5528.226 1.186064
Durbin“Watson stat. 1.974747 Prob(F -statistic) 0.256910

0.84 + 0.31i 0.84 ’ 0.31i 0.55 ’ 0.82i 0.55 + 0.82i
Inverted AR roots
0.07 + 0.98i 0.07 ’ 0.98i ’0.59 ’ 0.78i ’0.59 + 0.78i
’0.90 + 0.21i ’0.90 ’ 0.21i
0.85 + 0.31i 0.85 ’ 0.31i 0.55 ’ 0.82i 0.55 + 0.82i
Inverted MA roots
0.07 ’ 0.99i 0.07 + 0.99i ’0.59 ’ 0.79i ’0.59 + 0.79i
’0.90 + 0.20i ’0.90 ’ 0.20i
16 Applied Quantitative Methods for Trading and Investment
Table 1.6 Restricted ARMA(10,10) EUR/USD returns estimation

Dependent Variable: DR’ USEURSP
Method: Least Squares
Sample(adjusted): 12 1459
Included observations: 1448 after adjusting endpoints
Convergence achieved after 50 iterations
White Heteroskedasticity“Consistent Standard Errors & Covariance
Backcast: 2 11

t-Statistic
Variable Coef¬cient Std. error Prob.

’0.000221 ’1.531755
C 0.000144 0.1258
AR(1) 0.263934 0.049312 5.352331 0.0000
’0.444082 ’10.90827
AR(3) 0.040711 0.0000
’0.334221 ’9.410267
AR(6) 0.035517 0.0000
’0.636137 ’14.70664
AR(10) 0.043255 0.0000
’0.247033 ’5.361213
MA(1) 0.046078 0.0000
MA(3) 0.428264 0.030768 13.91921 0.0000
MA(6) 0.353457 0.028224 12.52307 0.0000
MA(10) 0.675965 0.041063 16.46159 0.0000

’0.000225
R-squared 0.015268 Mean dependent var.
Adjusted R-squared 0.009793 S.D. dependent var. 0.005363
’7.622139
S.E. of regression 0.005337 Akaike info. criterion
’7.589334
Sum squared resid. 0.040987 Schwarz criterion
F -statistic
Log likelihood 5527.429 2.788872
Durbin“Watson stat. 2.019754 Prob(F -statistic) 0.004583

0.89 + 0.37i 0.89 ’ 0.37i 0.61 + 0.78i 0.61 ’ 0.78i
Inverted AR roots
0.08 ’ 0.98i 0.08 + 0.98i ’0.53 ’ 0.70i ’0.53 + 0.70i
’0.92 + 0.31i ’0.92 ’ 0.31i
0.90 ’ 0.37i 0.90 + 0.37i 0.61 + 0.78i 0.61 ’ 0.78i
Inverted MA roots
0.07 + 0.99i 0.07 ’ 0.99i ’0.54 ’ 0.70i ’0.54 + 0.70i
’0.93 + 0.31i ’0.93 ’ 0.31i


The selected ARMA model, namely the restricted ARMA(10,10) model, takes the form:

Yt = ’0.0002 + 0.2639Yt’1 ’ 0.4440Yt’3 ’ 0.3342Yt’6 ’ 0.6361Yt’10
’ 0.2470µt’1 + 0.4283µt’3 + 0.3535µt’6 + 0.6760µt’10

The restricted ARMA(10,10) model was retained for out-of-sample estimation. The per-
formance of the strategy is evaluated in terms of traditional forecasting accuracy and
in terms of trading performance. Several other models were produced and their per-
formance evaluated, for example an alternative restricted ARMA(10,10) model was pro-
duced (equation arma16710). The decision to retain the original restricted ARMA(10,10)
model is because it has signi¬cantly better in-sample trading results than the alternative
ARMA(10,10) model. The annualised return, Sharpe ratio and correct directional change
of the original model were 12.65%, 1.49 and 53.80%, respectively. The corresponding
Applications of Advanced Regression Analysis 17
Table 1.7 Restricted ARMA(10,10) correlogram of residuals

Sample: 12 1459
Included observations: 1448
Q-statistic probabilities adjusted for 8 ARMA term(s)

Q-Stat.
Autocorrelation Partial correlation Prob.

’0.010 ’0.010
1 0.1509
’0.004 ’0.004
2 0.1777
3 0.004 0.004 0.1973
’0.001 ’0.001
4 0.1990
5 0.000 0.000 0.1991
’0.019 ’0.019
6 0.7099
’0.004 ’0.004
7 0.7284
’0.015 ’0.015
8 1.0573
9 0.000 0.000 1.0573 0.304
10 0.009 0.009 1.1824 0.554
11 0.031 0.032 2.6122 0.455
’0.024 ’0.024
12 3.4600 0.484
13 0.019 0.018 3.9761 0.553
’0.028 ’0.028
14 5.0897 0.532
15 0.008 0.008 5.1808 0.638


values for the alternative model were 9.47%, 1.11 and 52.35%. The evaluation can be
reviewed in Sheet 2 of the is arma13610.xls and is arma16710.xls Excel spreadsheets,
and is also presented in Figures 1.10 and 1.11, respectively. Ultimately, we chose the
model that satis¬ed the usual statistical tests and that also recorded the best in-sample
trading performance.

1.4.4 Logit estimation
The logit model belongs to a group of models termed “classi¬cation models”. They
are a multivariate statistical technique used to estimate the probability of an upward or
downward movement in a variable. As a result they are well suited to rates of return
applications where a recommendation for trading is required. For a full discussion of the
procedure refer to Maddala (2001), Pesaran and Pesaran (1997), or Thomas (1997).
The approach assumes the following regression model:

Yt— = β0 + β1 X1,t + β2 X2,t + · · · + βp Xp,t + µt (1.6)

where Yt— is the dependent variable at time t; X1,t , X2,t , . . . , Xp,t are the explanatory
variables at time t; β0 , β1 , . . . , βp are the regression coef¬cients; µt is the residual term.
However, Yt— is not directly observed; what is observed is a dummy variable Yt
de¬ned by:
if Yt— > 0
Yt = 1 (1.7)
0 otherwise
Therefore, the model requires a transformation of the explained variable, namely the
EUR/USD returns series into a binary series. The procedure is quite simple: a binary
18 Applied Quantitative Methods for Trading and Investment
Table 1.8 Restricted ARMA(10,10) serial correlation LM test

Breusch“Godfrey Serial Correlation LM Test

F -statistic 0.582234 Probability 0.558781
Obs*R-squared 1.172430 Probability 0.556429


Dependent Variable: RESID
Method: Least Squares
Presample missing value lagged residuals set to zero

t-Statistic
Variable Coef¬cient Std. error Prob.

C 8.33E-07 0.000144 0.005776 0.9954
AR(1) 0.000600 0.040612 0.014773 0.9882
AR(3) 0.019545 0.035886 0.544639 0.5861
AR(6) 0.018085 0.031876 0.567366 0.5706
’0.028997 ’0.774561
AR(10) 0.037436 0.4387
’0.000884 ’0.023012
MA(1) 0.038411 0.9816
’0.015096 ’0.568839
MA(3) 0.026538 0.5696
’0.014584 ’0.559792
MA(6) 0.026053 0.5757
MA(10) 0.029482 0.035369 0.833563 0.4047
’0.010425 ’0.334276
RESID(’1) 0.031188 0.7382
’0.004640 ’0.173111
RESID(’2) 0.026803 0.8626

R-squared 0.000810 Mean dependent var. 1.42E-07
’0.006144
Adjusted R-squared S.D. dependent var. 0.005322
’7.620186
S.E. of regression 0.005338 Akaike info. criterion
’7.580092
Sum squared resid. 0.040953 Schwarz criterion
F -statistic
Log likelihood 5528.015 0.116447
Durbin“Watson stat. 1.998650 Prob(F -statistic) 0.999652



Table 1.9 Restricted ARMA(10,10) RESET test for model misspeci¬cation

Ramsey RESET Test

F -statistic 0.785468 Probability 0.375622
Log likelihood ratio 0.790715 Probability 0.373884



variable equal to one is produced if the return is positive, and zero otherwise. The same
transformation for the explanatory variables, although not necessary, was performed for
homogeneity reasons.
A basic regression technique is used to produce the logit model. The idea is to start with
a model containing several variables, including lagged dependent terms, then through a
series of tests the model is modi¬ed.
The selected logit model, which we shall name logit1 (equation logit1 of the logit.wf1
EViews work¬le), takes the form:
Applications of Advanced Regression Analysis 19




Figure 1.10 Restricted ARMA(10,10) model Excel spreadsheet (in-sample)

Yt— = 0.2492 ’ 0.3613X1,t ’ 0.2872X2,t + 0.2862X3,t + 0.2525X4,t
’ 0.3692X5,t ’ 0.3937X6,t + µt

where X1,t , . . . , X6,t are the JP yc(’2), UK yc(’9), JAPDOWA(’1), ITMIB30(’19),
JAPAYE$(’10), and OILBREN(’1) binary explanatory variables, respectively.9
All of the coef¬cients in the model are signi¬cant at the 98% con¬dence interval. The
overall signi¬cance of the model is tested using the LR test. The null hypothesis that all
coef¬cients except the constant are not signi¬cantly different from zero is rejected at the
99% con¬dence interval. The results of this are presented in Table 1.10.
To justify the use of Japanese variables, which seems dif¬cult from an economic per-
spective, the joint overall signi¬cance of this subset of variables is tested using the LR test
for redundant variables. The null hypothesis that these coef¬cients, except the constant,
are not jointly signi¬cantly different from zero is rejected at the 99% con¬dence interval.
The results of this are presented in Table 1.11. In addition, a model that did not include
the Japanese variables, but was otherwise identical to logit1, was produced and the trad-
ing performance evaluated, which we shall name nojap (equation nojap of the logit.wf1
EViews work¬le). The Sharpe ratio, average gain/loss ratio and correct directional change
of the nojap model were 1.34, 1.01 and 54.38%, respectively. The corresponding values
for the logit1 model were 2.26, 1.01 and 58.13%. The evaluation can be reviewed in
Sheet 2 of the is logit1.xls and is nojap.xls Excel spreadsheets, and is also presented in
Figures 1.12 and 1.13, respectively.

9
Datastream mnemonics as mentioned in Table 1.1, yield curves and lags in brackets are used to save space.
20 Applied Quantitative Methods for Trading and Investment




Figure 1.11 Alternative restricted ARMA(10,10) model Excel spreadsheet (in-sample)

The logit1 model was retained for out-of-sample estimation. As, in practice, the estima-
tion of the model is based upon the cumulative distribution of the logistic function for the
error term, the forecasts produced range between zero and one, requiring transformation
into a binary series. Again, the procedure is quite simple: a binary variable equal to one
is produced if the forecast is greater than 0.5 and zero otherwise.
The performance of the strategy is evaluated in terms of forecast accuracy via the
correct directional change measure and in terms of trading performance. Several other
adequate models were produced and their performance evaluated. None performed better
in-sample, therefore the logit1 model was retained.

1.5 NEURAL NETWORK MODELS: THEORY
AND METHODOLOGY
Neural networks are “data-driven self-adaptive methods in that there are few a priori
assumptions about the models under study” (Zhang et al., 1998: 35). As a result, they are
well suited to problems where economic theory is of little use. In addition, neural networks
are universal approximators capable of approximating any continuous function (Hornik
et al., 1989).
Many researchers are confronted with problems where important nonlinearities
exist between the independent variables and the dependent variable. Often, in
such circumstances, traditional forecasting methods lack explanatory power. Recently,
nonlinear models have attempted to cover this shortfall. In particular, NNR models
have been applied with increasing success to ¬nancial markets, which often contain
nonlinearities (Dunis and Jalilov, 2002).
Applications of Advanced Regression Analysis 21
Table 1.10 Logit1 EUR/USD returns estimation

Dependent Variable: BDR’ USEURSP
Method: ML “ Binary Logit
Sample(adjusted): 20 1459
Included observations: 1440 after adjusting endpoints
Convergence achieved after 3 iterations
Covariance matrix computed using second derivatives

z-Statistic
Variable Coef¬cient Std. error Prob.

C 0.249231 0.140579 1.772894 0.0762
’0.361289 ’3.317273
BDR’ JP’ YC(’2) 0.108911 0.0009
’0.287220 ’2.649696
BDR’ UK’ YC(’9) 0.108397 0.0081
BDR’ JAPDOWA(’1) 0.286214 0.108687 2.633369 0.0085
BDR’ ITMIB31(’19) 0.252454 0.108056 2.336325 0.0195
’0.369227 ’3.408025
BDR’ JAPAYE$(’10) 0.108341 0.0007
’0.393689 ’3.629261
BDR’ OILBREN(’1) 0.108476 0.0003

Mean dependent var. 0.457639 S.D. dependent var. 0.498375
S.E. of regression 0.490514 Akaike info. criterion 1.353305
Sum squared resid. 344.7857 Schwarz criterion 1.378935
’967.3795
Log likelihood Hannan“Quinn criterion 1.362872
’992.9577 ’0.671791
Restr. log likelihood Avg. log likelihood
McFadden R-squared
LR statistic (6 df) 51.15635 0.025760
Prob(LR statistic) 2.76E-09

Obs. with dep = 0 781 Total obs. 1440
Obs. with dep = 1 659


Theoretically, the advantage of NNR models over traditional forecasting methods is
because, as is often the case, the model best adapted to a particular problem cannot be
identi¬ed. It is then better to resort to a method that is a generalisation of many models,
than to rely on an a priori model (Dunis and Huang, 2002).
However, NNR models have been criticised and their widespread success has been hin-
dered because of their “black-box” nature, excessive training times, danger of over¬tting,
and the large number of “parameters” required for training. As a result, deciding on the
appropriate network involves much trial and error.
For a full discussion on neural networks, please refer to Haykin (1999), Kaastra and
Boyd (1996), Kingdon (1997), or Zhang et al. (1998). Notwithstanding, we provide below
a brief description of NNR models and procedures.


1.5.1 Neural network models

The will to understand the functioning of the brain is the basis for the study of neural
networks. Mathematical modelling started in the 1940s with the work of McCulloch and
Pitts, whose research was based on the study of networks composed of a number of simple
interconnected processing elements called neurons or nodes. If the description is correct,
22 Applied Quantitative Methods for Trading and Investment
Table 1.11 Logit1 estimation redundant variables LR test

Redundant Variables: BDR’ JP’ YC(’2), BDR’ JAPDOWA(’1), BDR’ JAPAYE$(’10)

F -statistic 9.722023 Probability 0.000002
Log likelihood ratio 28.52168 Probability 0.000003


Test Equation:
Dependent Variable: BDR’ USEURSP
Method: ML “ Binary Logit
Sample: 20 1459
Included observations: 1440
Convergence achieved after 3 iterations
Covariance matrix computed using second derivatives

z-Statistic
Variable Coef¬cient Std. error Prob.

’0.013577 ’0.128959
C 0.105280 0.8974
’0.247254 ’2.311245
BDR’ UK’ YC(’9) 0.106979 0.0208
BDR’ ITMIB31(’19) 0.254096 0.106725 2.380861 0.0173
’0.345654 ’3.237047
BDR’ OILBREN(’1) 0.106781 0.0012

Mean dependent var. 0.457639 S.D. dependent var. 0.498375
S.E. of regression 0.494963 Akaike info. criterion 1.368945
Sum squared resid. 351.8032 Schwarz criterion 1.383590
’981.6403
Log likelihood Hannan“Quinn criterion 1.374412
’992.9577 ’0.681695
Restr. log likelihood Avg. log likelihood
McFadden R-squared
LR statistic (3 df) 22.63467 0.011398
Prob(LR statistic) 4.81E-05
Obs. with dep = 0 781 Total obs. 1440
Obs. with dep = 1 659


they can be turned into models mimicking some of the brain™s functions, possibly with
the ability to learn from examples and then to generalise on unseen examples.
A neural network is typically organised into several layers of elementary processing
units or nodes. The ¬rst layer is the input layer, the number of nodes corresponding
to the number of variables, and the last layer is the output layer, the number of nodes
corresponding to the forecasting horizon for a forecasting problem.10 The input and output
layer can be separated by one or more hidden layers, with each layer containing one or
more hidden nodes.11 The nodes in adjacent layers are fully connected. Each neuron
receives information from the preceding layer and transmits to the following layer only.12
The neuron performs a weighted summation of its inputs; if the sum passes a threshold
the neuron transmits, otherwise it remains inactive. In addition, a bias neuron may be
connected to each neuron in the hidden and output layers. The bias has a value of positive

10
Linear regression models may be viewed analogously to neural networks with no hidden layers (Kaastra and
Boyd, 1996).
11
Networks with hidden layers are multilayer networks; a multilayer perceptron network is used for this chapter.
12
If the ¬‚ow of information through the network is from the input to the output, it is known as “feedforward”.
Applications of Advanced Regression Analysis 23




Figure 1.12 Logit1 estimation Excel spreadsheet (in-sample)




Figure 1.13 Nojap estimation Excel spreadsheet (in-sample)
24 Applied Quantitative Methods for Trading and Investment


xt[1]


xt[2] Σ « ht[1]

˜
Σ « yt
xt[3]
Σ « ht[2]
yt
xt[4]


xt[5]



where xt [i ] (i = 1, 2, ..., 5) are the NNR model inputs at time t
ht [j ] ( j = 1, 2) are the hidden nodes outputs

yt and yt are the actual value and NNR model output, respectively

Figure 1.14 A single output fully connected NNR model

one and is analogous to the intercept in traditional regression models. An example of
a fully connected NNR model with one hidden layer and two nodes is presented in
Figure 1.14.
The vector A = (x [1] , x [2] , . . . , x [n] ) represents the input to the NNR model where xt[i] is
the level of activity of the ith input. Associated with the input vector is a series of weight
vectors Wj = (w1j , w2j , . . . , wnj ) so that wij represents the strength of the connection
between the input xt[i] and the processing unit bj . There may also be the input bias •j
modulated by the weight w0j associated with the inputs. The total input of the node bj
is the dot product between vectors A and Wj , less the weighted bias. It is then passed
through a nonlinear activation function to produce the output value of processing unit bj :
n
bj = f x [i] wij ’ w0j •j = f (Xj ) (1.8)
i=1

Typically, the activation function takes the form of the logistic function, which introduces
a degree of nonlinearity to the model and prevents outputs from reaching very large
values that can “paralyse” NNR models and inhibit training (Kaastra and Boyd, 1996;
Zhang et al., 1998). Here we use the logistic function:

1
f (Xj ) = (1.9)
1 + e’Xj
The modelling process begins by assigning random values to the weights. The output
value of the processing unit is passed on to the output layer. If the output is optimal,
the process is halted, if not, the weights are adjusted and the process continues until an
optimal solution is found. The output error, namely the difference between the actual
value and the NNR model output, is the optimisation criterion. Commonly, the criterion
Applications of Advanced Regression Analysis 25

is the root-mean-squared error (RMSE). The RMSE is systematically minimised through
the adjustment of the weights. Basically, training is the process of determining the optimal
solutions network weights, as they represent the knowledge learned by the network. Since
inadequacies in the output are fed back through the network to adjust the network weights,
the NNR model is trained by backpropagation13 (Shapiro, 2000).
A common practice is to divide the time series into three sets called the training, test and
validation (out-of-sample) sets, and to partition them as roughly 2 , 1 and 1 , respectively.
36 6
The testing set is used to evaluate the generalisation ability of the network. The technique
consists of tracking the error on the training and test sets. Typically, the error on the
training set continually decreases, however the test set error starts by decreasing and
then begins to increase. From this point the network has stopped learning the similarities
between the training and test sets, and has started to learn meaningless differences, namely
the noise within the training data. For good generalisation ability, training should stop
when the test set error reaches its lowest point. The stopping rule reduces the likelihood
of over¬tting, i.e. that the network will become overtrained (Dunis and Huang, 2002;
Mehta, 1995).
An evaluation of the performance of the trained network is made on new examples not
used in network selection, namely the validation set. Crucially, the validation set should
never be used to discriminate between networks, as any set that is used to choose the
best network is, by de¬nition, a test set. In addition, good generalisation ability requires
that the training and test sets are representative of the population, inappropriate selection
will affect the network generalisation ability and forecast performance (Kaastra and Boyd,
1996; Zhang et al., 1998).


1.5.2 Issues in neural network modelling

Despite the satisfactory features of NNR models, the process of building them should not
be taken lightly. There are many issues that can affect the network™s performance and
should be considered carefully.
The issue of ¬nding the most parsimonious model is always a problem for statistical
methods and particularly important for NNR models because of the problem of over¬tting.
Parsimonious models not only have the recognition ability but also the more important
generalisation ability. Over¬tting and generalisation are always going to be a problem
for real-world situations, and this is particularly true for ¬nancial applications where time
series may well be quasi-random, or at least contain noise.
One of the most commonly used heuristics to ensure good generalisation is the applica-
tion of some form of Occam™s Razor. The principle states, “unnecessary complex models
should not be preferred to simpler ones. However . . . more complex models always ¬t
the data better” (Kingdon, 1997: 49). The two objectives are, of course, contradictory.
The solution is to ¬nd a model with the smallest possible complexity, and yet which can
still describe the data set (Haykin, 1999; Kingdon, 1997).
A reasonable strategy in designing NNR models is to start with one layer containing a
few hidden nodes, and increase the complexity while monitoring the generalisation ability.
The issue of determining the optimal number of layers and hidden nodes is a crucial factor

13
Backpropagation networks are the most common multilayer network and are the most used type in ¬nancial
time series forecasting (Kaastra and Boyd, 1996). We use them exclusively here.
26 Applied Quantitative Methods for Trading and Investment

for good network design, as the hidden nodes provide the ability to generalise. However,
in most situations there is no way to determine the best number of hidden nodes without
training several networks. Several rules of thumb have been proposed to aid the process,
however none work well for all applications. Notwithstanding, simplicity must be the
aim (Mehta, 1995).
Since NNR models are pattern matchers, the representation of data is critical for a
successful network design. The raw data for the input and output variables are rarely fed
into the network, they are generally scaled between the upper and lower bounds of the
activation function. For the logistic function the range is [0,1], avoiding the function™s
saturation zones. Practically, as here, a normalisation [0.2,0.8] is often used with the
logistic function, as its limits are only reached for in¬nite input values (Zhang et al.,
1998).
Crucial for backpropagation learning is the learning rate of the network as it determines
the size of the weight changes. Smaller learning rates slow the learning process, while
larger rates cause the error function to change wildly without continuously improving.
To improve the process a momentum parameter is used which allows for larger learning
rates. The parameter determines how past weight changes affect current weight changes,
by making the next weight change in approximately the same direction as the previous
one14 (Kaastra and Boyd, 1996; Zhang et al., 1998).

1.5.3 Neural network modelling procedure
Conforming to standard heuristics, the training, test and validation sets were partitioned
as approximately 2 , 1 and 1 , respectively. The training set runs from 17 October 1994
36 6
to 8 April 1999 (1169 observations), the test set runs from 9 April 1999 to 18 May
2000 (290 observations), and the validation set runs from 19 May 2000 to 3 July 2001
(290 observations), reserved for out-of-sample forecasting and evaluation, identical to the
out-of-sample period for the benchmark models.
To start, traditional linear cross-correlation analysis helped establish the existence of
a relationship between EUR/USD returns and potential explanatory variables. Although
NNR models attempt to map nonlinearities, linear cross-correlation analysis can give
some indication of which variables to include in a model, or at least a starting point to
the analysis (Diekmann and Gutjahr, 1998; Dunis and Huang, 2002).
The analysis was performed for all potential explanatory variables. Lagged terms
that were most signi¬cant as determined via the cross-correlation analysis are presented
in Table 1.12.
The lagged terms SPCOMP(’1) and US yc(’1) could not be used because of time-zone
differences between London and the USA, as discussed at the beginning of Section 1.3.
As an initial substitute SPCOMP(’2) and US yc(’2) were used. In addition, various
lagged terms of the EUR/USD returns were included as explanatory variables.
Variable selection was achieved via a forward stepwise NNR procedure, namely poten-
tial explanatory variables were progressively added to the network. If adding a new
variable improved the level of explained variance (EV) over the previous “best” network,
the pool of explanatory variables was updated.15 Since the aim of the model-building

14
The problem of convergence did not occur within this research; as a result, a learning rate of 0.1 and
momentum of zero were used exclusively.
15
EV is an approximation of the coef¬cient of determination, R 2 , in traditional regression techniques.
Applications of Advanced Regression Analysis 27
Table 1.12 Most signi¬cant lag
of each potential explanatory
variable (in returns)

Variable Best lag

DAXINDX 10
DJES50I 10
FRCAC40 10
FTSE100 5
GOLDBLN 19
ITMIB 9
JAPAYE$ 10
OILBREN 1
JAPDOWA 15
SPCOMP 1
USDOLLR 12
BD yc 19
EC yc 2
FR yc 9
IT yc 2
JP yc 6
UK yc 19
US yc 1
NYFECRB 20


procedure is to build a model with good generalisation ability, a model that has a higher
EV level has a better ability. In addition, a good measure of this ability is to compare
the EV level of the test and validation sets: if the test set and validation set levels are
similar, the model has been built to generalise well.
The decision to use explained variance is because the EUR/USD returns series is a
stationary series and stationarity remains important if NNR models are assessed on the
level of explained variance (Dunis and Huang, 2002). The EV levels for the training,
test and validation sets of the selected NNR model, which we shall name nnr1 (nnr1.prv
Previa ¬le), are presented in Table 1.13.
An EV level equal to, or greater than, 80% was used as the NNR learning termination
criterion. In addition, if the NNR model did not reach this level within 1500 learning
sweeps, again the learning terminates. The criteria selected are reasonable for daily data
and were used exclusively here.
If after several attempts there was failure to improve on the previous “best” model,
variables in the model were alternated in an attempt to ¬nd a better combination. This

Table 1.13 nnr1 model EV for the training,
test and validation sets

Training set Test set Validation set

3.4% 2.3% 2.2%
28 Applied Quantitative Methods for Trading and Investment

procedure recognises the likelihood that some variables may only be relevant predictors
when in combination with certain other variables.
Once a tentative model is selected, post-training weights analysis helps establish the
importance of the explanatory variables, as there are no standard statistical tests for NNR
models. The idea is to ¬nd a measure of the contribution a given weight has to the
overall output of the network, in essence allowing detection of insigni¬cant variables.
Such analysis includes an examination of a Hinton graph, which represents graphically
the weight matrix within the network. The principle is to include in the network variables
that are strongly signi¬cant. In addition, a small bias weight is preferred (Diekmann and
Gutjahr, 1998; Kingdon, 1997; Previa, 2001). The input to a hidden layer Hinton graph
of the nnr1 model produced by Previa is presented in Figure 1.15. The graph suggests
that the explanatory variables of the selected model are strongly signi¬cant, both positive
(green) and negative (black), and that there is a small bias weight. In addition, the input
to hidden layer weight matrix of the nnr1 model produced by Previa is presented in
Table 1.14.
The nnr1 model contained the returns of the explanatory variables presented in
Table 1.15, having one hidden layer containing ¬ve hidden nodes.
Again, to justify the use of the Japanese variables a further model that did not include
these variables, but was otherwise identical to nnr1, was produced and the performance
evaluated, which we shall name nojap (nojap.prv Previa ¬le). The EV levels of the training




Figure 1.15 Hinton graph of the nnr1 EUR/USD returns model
Applications of Advanced Regression Analysis 29
Table 1.14 Input to hidden layer weight matrix of the nnr1 EUR/USD returns model

GOLD JAPAY JAP OIL US FR yc IT yc JP yc JAPAY JAP Bias
BLN E$ DOWA BREN DOLLR (’2) (’6) (’9) E$ DOWA
(’19) (’10) (’15) (’1) (’12) (’1) (’1)

’0.2120 ’0.4336 ’0.4579 ’0.2621 ’0.3911 ’0.0824
C[1,0] 0.2316 0.2408 0.4295 0.4067 0.4403
’0.1752 ’0.3589 ’0.5474 ’0.3663 ’0.4623 ’0.0225
C[1,1] 0.4016 0.2438 0.2786 0.2757 0.4831
’0.3037 ’0.4462 ’0.5139 ’0.2506 ’0.3491 ’0.0088
C[1,2] 0.2490 0.2900 0.3634 0.2737 0.4132
’0.3588 ’0.4089 ’0.5446 ’0.2730 ’0.4531
C[1,3] 0.3382 0.2555 0.4661 0.4153 0.5245 0.0373
’0.3283 ’0.4086 ’0.6108 ’0.2362 ’0.4828 ’0.0447
C[1,4] 0.3338 0.3088 0.4192 0.4254 0.4779


Table 1.15 nnr1 model explana-
tory variables (in returns)

Variable Lag

GOLDBLN 19
JAPAYE$ 10
JAPDOWA 15
OILBREN 1
USDOLLR 12
FR yc 2
IT yc 6
JP yc 9
JAPAYE$ 1
JAPDOWA 1


and test sets of the nojap model were 1.4 and 0.6 respectively, which are much lower
than the nnr1 model.
The nnr1 model was retained for out-of-sample estimation. The performance of the
strategy is evaluated in terms of traditional forecasting accuracy and in terms of trading
performance.
Several other adequate models were produced and their performance evaluated, includ-
ing RNN models.16 In essence, the only difference from NNR models is the addition of a
loop back from a hidden or the output layer to the input layer. The loop back is then used
as an input in the next period. There is no theoretical or empirical answer to whether the
hidden layer or the output should be looped back. However, the looping back of either
allows RNN models to keep the memory of the past,17 a useful property in forecasting
applications. This feature comes at a cost, as RNN models require more connections,
raising the issue of complexity. Since simplicity is the aim, a less complex model that
can still describe the data set is preferred.
The statistical forecasting accuracy results of the nnr1 model and the RNN model,
which we shall name rnn1 (rnn1.prv Previa ¬le), were only marginally different, namely
the mean absolute percentage error (MAPE) differs by 0.09%. However, in terms of

16
For a discussion on recurrent neural network models refer to Dunis and Huang (2002).
17
The looping back of the output layer is an error feedback mechanism, implying the use of a nonlinear
error-correction model (Dunis and Huang, 2002).
30 Applied Quantitative Methods for Trading and Investment




Figure 1.16 nnr1 model Excel spreadsheet (in-sample)




Figure 1.17 rnn1 model Excel spreadsheet (in-sample)
Applications of Advanced Regression Analysis 31

trading performance there is little to separate the nnr1 and rnn1 models. The evaluation
can be reviewed in Sheet 2 of the is nnr1.xls and is rnn1.xls Excel spreadsheets, and is
also presented in Figures 1.16 and 1.17, respectively.
The decision to retain the nnr1 model over the rnn1 model is because the rnn1 model is
more complex and yet does not possess any decisive added value over the simpler model.

1.6 FORECASTING ACCURACY AND TRADING SIMULATION
To compare the performance of the strategies, it is necessary to evaluate them on pre-
viously unseen data. This situation is likely to be the closest to a true forecasting or
trading situation. To achieve this, all models retained an identical out-of-sample period
allowing a direct comparison of their forecasting accuracy and trading performance.

1.6.1 Out-of-sample forecasting accuracy measures
Several criteria are used to make comparisons between the forecasting ability of the
benchmark and NNR models, including mean absolute error (MAE), RMSE,18 MAPE,
and Theil™s inequality coef¬cient (Theil-U).19 For a full discussion on these measures, refer
to Hanke and Reitsch (1998) and Pindyck and Rubinfeld (1998). We also include correct
directional change (CDC), which measures the capacity of a model to correctly predict the
subsequent actual change of a forecast variable, an important issue in a trading strategy
that relies on the direction of a forecast rather than its level. The statistical performance
measures used to analyse the forecasting techniques are presented in Table 1.16.

1.6.2 Out-of-sample trading performance measures
Statistical performance measures are often inappropriate for ¬nancial applications. Typi-
cally, modelling techniques are optimised using a mathematical criterion, but ultimately
the results are analysed on a ¬nancial criterion upon which it is not optimised. In other
words, the forecast error may have been minimised during model estimation, but the
evaluation of the true merit should be based on the performance of a trading strategy.
Without actual trading, the best means of evaluating performance is via a simulated trad-
ing strategy. The procedure to create the buy and sell signals is quite simple: a EUR/USD
buy signal is produced if the forecast is positive, and a sell otherwise.20
For many traders and analysts market direction is more important than the value of
the forecast itself, as in ¬nancial markets money can be made simply by knowing the
direction the series will move. In essence, “low forecast errors and trading pro¬ts are not
synonymous since a single large trade forecasted incorrectly . . . could have accounted for
most of the trading system™s pro¬ts” (Kaastra and Boyd, 1996: 229).
The trading performance measures used to analyse the forecasting techniques are pre-
sented in Tables 1.17 and 1.18. Most measures are self-explanatory and are commonly
used in the fund management industry. Some of the more important measures include
the Sharpe ratio, maximum drawdown and average gain/loss ratio. The Sharpe ratio is a
18
The MAE and RMSE statistics are scale-dependent measures but allow a comparison between the actual and
forecast values, the lower the values the better the forecasting accuracy.
19
When it is more important to evaluate the forecast errors independently of the scale of the variables, the
MAPE and Theil-U are used. They are constructed to lie within [0,1], zero indicating a perfect ¬t.
20
A buy signal is to buy euros at the current price or continue holding euros, while a sell signal is to sell euros
at the current price or continue holding US dollars.
32 Applied Quantitative Methods for Trading and Investment
Table 1.16 Statistical performance measures

Performance measure Description


T
1
MAE = |y t ’ y t |
˜
Mean absolute error (1.10)
T t=1
T
yt ’ yt
˜
100
MAPE =
Mean absolute percentage error (1.11)
T yt
t=1

T
1
RMSE = (yt ’ yt )2
˜
Root-mean-squared error (1.12)
T t=1

T
1
(yt ’ yt )2
˜
T t=1
U=
Theil™s inequality coef¬cient (1.13)
T T
1 1
(yt )2 +
˜ (yt )2
T T
t=1 t=1

N
100
CDC = Dt
Correct directional change (1.14)
N t=1

where Dt = 1 if yt · yt > 0 else Dt = 0
˜

yt is the actual change at time t.
˜
yt is the forecast change.
t = 1 to t = T for the forecast period.


risk-adjusted measure of return, with higher ratios preferred to those that are lower, the
maximum drawdown is a measure of downside risk and the average gain/loss ratio is
a measure of overall gain, a value above one being preferred (Dunis and Jalilov, 2002;
Fernandez-Rodriguez et al., 2000).
The application of these measures may be a better standard for determining the quality
of the forecasts. After all, the ¬nancial gain from a given strategy depends on trading
performance, not on forecast accuracy.

1.6.3 Out-of-sample forecasting accuracy results
The forecasting accuracy statistics do not provide very conclusive results. Each of the
models evaluated, except the logit model, are nominated “best” at least once. Interestingly,
the na¨ve model has the lowest Theil-U statistic at 0.6901; if this model is believed to be
±
the “best” model there is likely to be no added value using more complicated forecasting
techniques. The ARMA model has the lowest MAPE statistic at 101.51%, and equals
the MAE of the NNR model at 0.0056. The NNR model has the lowest RMSE statistic,
however the value is only marginally less than the ARMA model. The MACD model has
the highest CDC measure, predicting daily changes accurately 60.00% of the time. It is
dif¬cult to select a “best” performer from these results, however a majority decision rule
Applications of Advanced Regression Analysis 33
Table 1.17 Trading simulation performance measures

Performance measure Description

N
1
R = 252 —
A
Rt
Annualised return (1.15)
N t=1
N
R=C
RT
Cumulative return (1.16)
t=1

√ N
1
σ= 252 — (Rt ’ R)2
A
Annualised volatility (1.17)
N ’1 t=1

RA
SR =
Sharpe ratio (1.18)
σA
Maximum value of Rt over the period
Maximum daily pro¬t (1.19)
Minimum value of Rt over the period
Maximum daily loss (1.20)
(RT ) over the period
Maximum drawdown Maximum negative value of

MD = min Rtc ’ max Ric (1.21)
t=1,...,N i=1,...,t

N
Ft
t=1
WT = 100 —
% Winning trades (1.22)
NT
where Ft = 1 if transaction pro¬tt > 0
N
Gt
t=1
LT = 100 —
% Losing trades (1.23)
NT
where Gt = 1 if transaction pro¬tt < 0
Nup = number of Rt > 0
Number of up periods (1.24)
Ndown = number of Rt < 0
Number of down periods (1.25)
N
NT = Lt
Number of transactions (1.26)
t=1
where Lt = 1 if trading signalt = trading signalt’1
Number of all Rt ™s
Total trading days (1.27)
AG = (Sum of all Rt > 0)/Nup
Avg. gain in up periods (1.28)
AL = (Sum of all Rt < 0)/Ndown
Avg. loss in down periods (1.29)
GL = AG/AL
Avg. gain/loss ratio (1.30)
MaxRisk
(1 ’ P )
P oL =
P
(W T — AG) + (LT — AL)
where P = 0.5 — 1 +
Probability of 10% loss (1.31)
[(W T — AG2 ) + (LT — AL2 )]
= [(W T — AG2 ) + (LT — AL2 )]
and
MaxRisk is the risk level de¬ned by the user; this research, 10%
√ RA
T -statistics = N—
Pro¬ts T -statistics (1.32)
σA
Source: Dunis and Jalilov (2002).
34 Applied Quantitative Methods for Trading and Investment
Table 1.18 Trading simulation performance measures

Performance measure Description

N
NPR = Qt
Number of periods daily (1.33)
returns rise t=1
where Qt = 1 if yt > 0 else Qt = 0
N
NPF = St
Number of periods daily (1.34)
returns fall t=1
where St = 1 if yt < 0 else St = 0
N
NWU = Bt
Number of winning up (1.35)
periods t=1
where Bt = 1 if Rt > 0 and yt > 0 else Bt = 0
N
NWD = Et
Number of winning down (1.36)
periods t=1
where Et = 1 if Rt > 0 and yt < 0 else Et = 0
WUP = 100 — (NWU/NPR)
Winning up periods (%) (1.37)
WDP = 100 — (NWD/NPF)
Winning down periods (%) (1.38)


Table 1.19 Forecasting accuracy results21

Na¨ve
± MACD ARMA Logit NNR

Mean absolute error 0.0080 “ 0.0056 “ 0.0056
Mean absolute percentage error 317.31% “ 101.51% “ 107.38%
Root-mean-squared error 0.0102 “ 0.0074 “ 0.0073
Theil™s inequality coef¬cient 0.6901 “ 0.9045 “ 0.8788
Correct directional change 55.86% 60.00% 56.55% 53.79% 57.24%


might select the NNR model as the overall “best” model because it is nominated “best”
twice and also “second best” by the other three statistics. A comparison of the forecasting
accuracy results is presented in Table 1.19.


1.6.4 Out-of-sample trading performance results

A comparison of the trading performance results is presented in Table 1.20 and
Figure 1.18. The results of the NNR model are quite impressive. It generally outperforms
the benchmark strategies, both in terms of overall pro¬tability with an annualised return
of 29.68% and a cumulative return of 34.16%, and in terms of risk-adjusted performance
with a Sharpe ratio of 2.57. The logit model has the lowest downside risk as measured
by maximum drawdown at ’5.79%, and the MACD model has the lowest downside risk

21
As the MACD model is not based on forecasting the next period and binary variables are used in the logit
model, statistical accuracy comparisons with these models were not always possible.
Applications of Advanced Regression Analysis 35
Table 1.20 Trading performance results

Na¨ve
± MACD ARMA Logit NNR


Annualised return 21.34% 11.34% 12.91% 21.05% 29.68%
Cumulative return 24.56% 13.05% 14.85% 24.22% 34.16%
Annualised volatility 11.64% 11.69% 11.69% 11.64% 11.56%
Sharpe ratio 1.83 0.97 1.10 1.81 2.57
Maximum daily pro¬t 3.38% 1.84% 3.38% 1.88% 3.38%
’2.10% ’3.23% ’2.10% ’3.38% ’1.82%
Maximum daily loss
’9.06% ’7.75% ’10.10% ’5.79% ’9.12%
Maximum drawdown
% Winning trades 37.01% 24.00% 52.71% 49.65% 52.94%
% Losing trades 62.99% 76.00% 47.29% 50.35% 47.06%
Number of up periods 162 149 164 156 166
Number of down periods 126 138 124 132 122
Number of transactions 127 25 129 141 136
Total trading days 290 290 290 290 290
Avg. gain in up periods 0.58% 0.60% 0.55% 0.61% 0.60%
’0.56% ’0.55% ’0.61% ’0.53% ’0.54%
Avg. loss in down periods
Avg. gain/loss ratio 1.05 1.08 0.91 1.14 1.12
Probability of 10% loss 0.70% 0.02% 5.70% 0.76% 0.09%
Pro¬ts T -statistics 31.23 16.51 18.81 30.79 43.71
Number of periods daily returns rise 128 128 128 128 128
Number of periods daily returns fall 162 162 162 162 162
Number of winning up periods 65 45 56 49 52
Number of winning down periods 97 104 108 106 114
% Winning up periods 50.78% 35.16% 43.75% 38.28% 40.63%
% Winning down periods 59.88% 64.20% 66.67% 66.05% 70.37%


40%
35%
Naïve
30%
Cumulated profit




25% MACD
20% ARMA
15%
logit
10%
NNR
5%
0%
’5%
’10%
19/05/00 19/08/00 19/11/00 19/02/01 19/05/01
19 May 2000 to 3 July 2001

Figure 1.18 Cumulated pro¬t graph

as measured by the probability of a 10% loss at 0.02%, however this is only marginally
less than the NNR model at 0.09%.
The NNR model predicted the highest number of winning down periods at 114, while
the na¨ve model forecast the highest number of winning up periods at 65. Interestingly,
±
all models were more successful at forecasting a fall in the EUR/USD returns series, as
indicated by a greater percentage of winning down periods to winning up periods.
36 Applied Quantitative Methods for Trading and Investment

The logit model has the highest number of transactions at 141, while the NNR model
has the second highest at 136. The MACD strategy has the lowest number of transactions
at 25. In essence, the MACD strategy has longer “holding” periods compared to the
other models, suggesting that the MACD strategy is not compared “like with like” to the
other models.
More than with statistical performance measures, ¬nancial criteria clearly single out the
NNR model as the one with the most consistent performance. Therefore it is considered
the “best” model for this particular application.


1.6.5 Transaction costs
So far, our results have been presented without accounting for transaction costs during the
trading simulation. However, it is not realistic to account for the success or otherwise of
a trading system unless transaction costs are taken into account. Between market makers,
a cost of 3 pips (0.0003 EUR/USD) per trade (one way) for a tradable amount, typically
USD 5“10 million, would be normal. The procedure to approximate the transaction costs
for the NNR model is quite simple.
A cost of 3 pips per trade and an average out-of-sample EUR/USD of 0.8971 produce
an average cost of 0.033% per trade:

0.0003
= 0.033%
0.8971
The NNR model made 136 transactions. Since the EUR/USD time series is a series of
bid rates and because, apart from the ¬rst trade, each signal implies two transactions, one
to close the existing position and a second one to enter the new position indicated by the
model signal, the approximate out-of-sample transaction costs for the NNR model trading
strategy are about 4.55%:
136 — 0.033% = 4.55%

Therefore, even accounting for transaction costs, the extra returns achieved with the
NNR model still make this strategy the most attractive one despite its relatively high
trading frequency.


1.7 CONCLUDING REMARKS
This chapter has evaluated the use of different regression models in forecasting and trading
the EUR/USD exchange rate. The performance was measured statistically and ¬nancially
via a trading simulation taking into account the impact of transaction costs on models
with higher trading frequencies. The logic behind the trading simulation is, if pro¬t from
a trading simulation is compared solely on the basis of statistical measures, the optimum
model from a ¬nancial perspective would rarely be chosen.
The NNR model was benchmarked against more traditional regression-based and other
benchmark forecasting techniques to determine any added value to the forecasting process.
Having constructed a synthetic EUR/USD series for the period up to 4 January 1999, the
models were developed using the same in-sample data, 17 October 1994 to 18 May 2000,
leaving the remaining period, 19 May 2000 to 3 July 2001, for out-of-sample forecasting.
Applications of Advanced Regression Analysis 37

Forecasting techniques rely on the weaknesses of the ef¬cient market hypothesis,
acknowledging the existence of market inef¬ciencies, with markets displaying even weak
signs of predictability. However, FX markets are relatively ef¬cient, reducing the scope of
a pro¬table strategy. Consequently, the FX managed futures industry average Sharpe ratio
is only 0.8, although a percentage of winning trades greater than 60% is often required
to run a pro¬table FX trading desk (Grabbe, 1996 as cited in Bellgard and Goldschmidt,
1999: 10). In this respect, it is worth noting that only one of our models reached a 60%
winning trades accuracy, namely the MACD model at 60.00%. Nevertheless, all of the
models examined in this chapter achieved an out-of-sample Sharpe ratio higher than 0.8,
the highest of which was again the NNR model at 2.57. This seems to con¬rm that the
use of quantitative trading is more appropriate in a fund management than in a treasury
type of context.
Forecasting techniques are dependent on the quality and nature of the data used. If the
solution to a problem is not within the data, then no technique can extract it. In addition,
suf¬cient information should be contained within the in-sample period to be representative
of all cases within the out-of-sample period. For example, a downward trending series
typically has more falls represented in the data than rises. The EUR/USD is such a series
within the in-sample period. Consequently, the forecasting techniques used are estimated
using more negative values than positive values. The probable implication is that the
models are more likely to successfully forecast a fall in the EUR/USD, as indicated by
our results, with all models forecasting a higher percentage of winning down periods than
winning up periods. However, the na¨ve model does not learn to generalise per se, and
±
as a result has the smallest difference between the number of winning up to winning
down periods.
Overall our results con¬rm the credibility and potential of regression models and par-
ticularly NNR models as a forecasting technique. However, while NNR models offer a
promising alternative to more traditional techniques, they suffer from a number of limita-
tions. They are not the panacea. One of the major disadvantages is the inability to explain
their reasoning, which has led some to consider that “neural nets are truly black boxes.
Once you have trained a neural net and are generating predictions, you still do not know
why the decisions are being made and can™t ¬nd out by just looking at the net. It is not
unlike attempting to capture the structure of knowledge by dissecting the human brain”
(Fishman et al., 1991 as cited in El-Shazly and El-Shazly, 1997: 355). In essence, the neu-
ral network learning procedure is not very transparent, requiring a lot of understanding.
In addition, statistical inference techniques such as signi¬cance testing cannot always be
applied, resulting in a reliance on a heuristic approach. The complexity of NNR models
suggests that they are capable of superior forecasts, as shown in this chapter, however
this is not always the case. They are essentially nonlinear techniques and may be less ca-
pable in linear applications than traditional forecasting techniques (Balkin and Ord, 2000;
Campbell et al., 1997; Lisboa and Vellido, 2000; Refenes and Zaidi, 1993).
Although the results support the success of neural network models in ¬nancial appli-
cations, there is room for increased success. Such a possibility lies with optimising the
neural network model on a ¬nancial criterion, and not a mathematical criterion. As the
pro¬tability of a trading strategy relies on correctly forecasting the direction of change,
namely CDC, to optimise the neural network model on such a measure could improve
trading performance. However, backpropagation networks optimise by minimising a dif-
ferentiable function such as squared error, they cannot minimise a function based on loss,
38 Applied Quantitative Methods for Trading and Investment

or conversely, maximise a function based on pro¬t. Notwithstanding, there is possibility
to explore this idea further, provided the neural network software has the ability to select
such an optimisation criterion.
Future work might also include the addition of hourly data as a possible explanatory
variable. Alternatively, the use of ¬rst differences instead of rates of return series may be
investigated, as ¬rst differences are perhaps the most effective way to generate data sets
for neural network learning (Mehta, 1995).
Further investigation into RNN models is possible, or into combining forecasts. Many
researchers agree that individual forecasting methods are misspeci¬ed in some manner,
suggesting that combining multiple forecasts leads to increased forecast accuracy (Dunis
and Huang, 2002). However, initial investigations proved unsuccessful, with the NNR
model remaining the “best” model. Two simple model combinations were examined,
a simple averaging of the na¨ve, ARMA and NNR model forecasts, and a regression-
±
type combined forecast using the na¨ve, ARMA and NNR models.22 The regression-
±
type combined forecast follows the Granger and Ramanathan procedure (gr.wf1 EViews
work¬le). The evaluation can be reviewed in Sheet 2 of the oos gr.xls Excel spreadsheet,
and is also presented in Figure 1.19. The lack of success using the combination models
was undoubtedly because the performance of the benchmark models was so much weaker
than that of the NNR model. It is unlikely that combining relatively “poor” models with
an otherwise “good” one will outperform the “good” model alone.
The main conclusion that can be drawn from this chapter is that there are indeed
nonlinearities present within ¬nancial markets and that a neural network model can be




Figure 1.19 Regression-type combined forecast Excel spreadsheet (out-of-sample)

22
For a full discussion on the procedures, refer to Clemen (1989), Granger and Ramanathan (1984), and Hashem
(1997).
Applications of Advanced Regression Analysis 39

trained to recognise them. However, despite the limitations and potential improvements
mentioned above, our results strongly suggest that regression models and particularly
NNR models can add value to the forecasting process. For the EUR/USD exchange rate
and the period considered, NNR models clearly outperform the more traditional modelling
techniques analysed in this chapter.

REFERENCES
Balkin, S. D. and J. K. Ord (2000), “Automatic Neural Network Modelling for Univariate Time
Series”, International Journal of Forecasting, 16, 509“515.
Bellgard, C. and P. Goldschmidt (1999), “Forecasting Across Frequencies: Linearity and Non-
Linearity”, University of Western Australia Research Paper, Proceedings of the International
Conference on Advanced Technology, Australia, (www.imm.ecel.uwa.edu.au/∼cbellgar/).
Box, G. E. P., G. M. Jenkins and G. C. Reinsel (1994), Time Series Analysis: Forecasting and
Control, Prentice Hall, Englewood Cliffs, NJ.
Campbell, I. Y., A. W. Lo and A. C. MacKinley (1997), “Nonlinearities in Financial Data”, in The
Econometrics of Financial Markets, Princeton University Press, Princeton, NJ, pp. 512“524.
Carney, J. C. and P. Cunningham (1996), “Neural Networks and Currency Exchange
Rate Prediction”, Trinity College Working Paper, Foresight Business Journal web page,
(www.maths.tcd.ie/pub/fbj/forex4.html).
Clemen, R. T. (1989), “Combining Forecasts: A Review and Annotated Bibliography”, Inter-
national Journal of Forecasting, 5, 559“583.
Diekmann, A. and S. Gutjahr (1998), “Prediction of the Euro“Dollar Future Using Neural Net-
works “ A Case Study for Financial Time Series Prediction”, University of Karlsruhe Working
Paper, Proceedings of the International Symposium on Intelligent Data Engineering and Learning
(IDEAL™98), Hong Kong, (http://citeseer.nj.nec.com/diekmann98prediction.html).
Dunis, C. and X. Huang (2002), “Forecasting and Trading Currency Volatility: An Application of
Recurrent Neural Regression and Model Combination”, The Journal of Forecasting, 21, 317“354.
Dunis, C. and J. Jalilov (2002), “Neural Network Regression and Alternative Forecasting Tech-
niques for Predicting Financial Variables”, Neural Network World, 2, 113“139.
El-Shazly, M. R. and H. E. El-Shazly (1997), “Comparing the Forecasting Performance of Neural
Networks and Forward Exchange Rates”, Journal of Multinational Financial Management, 7,
345“356.
Fernandez-Rodriguez, F., C. Gonzalez-Martel and S. Sosvilla-Rivero (2000), “On the Pro¬tability
of Technical Trading Rules Based on Arti¬cial Neural Networks: Evidence from the Madrid
Stock Market”, Economics Letters, 69, 89“94.
Fishman, M. B., D. S. Barr and W. J. Loick (1991), “Using Neural Nets in Market Analysis”,
Technical Analysis of Stocks and Commodities, 9, 4, 135“138.
Gen¸ ay, R. (1999), “Linear, Non-linear and Essential Foreign Exchange Rate Prediction with Simple
c
Technical Trading Rules”, Journal of International Economics, 47, 91“107.
Gouri´ roux, C. and A. Monfort (1995), Time Series and Dynamic Models, translated and edited by
e
G. Gallo, Cambridge University Press, Cambridge.
Grabbe, J. O. (1996), International Financial Markets, 3rd edition, Prentice Hall, Englewood Cliffs,
NJ.
Granger, C. W. J. and R. Ramanathan (1984), “Improved Methods of Combining Forecasts”, Jour-
nal of Forecasting, 3, 197“204.
Hanke, J. E. and A. G. Reitsch (1998), Business Forecasting, 6th edition, Prentice Hall, Englewood
Cliffs, NJ.
Hashem, S. (1997), “Optimal Linear Combinations of Neural Networks”, Neural Networks, 10, 4,
599“614 (www.emsl.pnl.gov:2080/people/bionames/hashem s.html).
Haykin, S. (1999), Neural Networks: A Comprehensive Foundation, 2nd edition, Prentice Hall,
Englewood Cliffs, NJ.
Hornik, K., M. Stinchcombe and H. White (1989), “Multilayer Feedforward Networks Are Univer-
sal Approximators”, Neural Networks, 2, 359“366.
Kaastra, I. and M. Boyd (1996), “Designing a Neural Network for Forecasting Financial and
Economic Time Series”, Neurocomputing, 10, 215“236.
40 Applied Quantitative Methods for Trading and Investment
Kingdon, J. (1997), Intelligent Systems and Financial Forecasting, Springer, London.
Lisboa, P. J. G. and A. Vellido (2000), “Business Applications of Neural Networks”, in
P. J. G. Lisboa, B. Edisbury and A. Vellido (eds), Business Applications of Neural Networks:
The State-of-the-Art of Real-World Applications, World Scienti¬c, Singapore, pp. vii“xxii.
Maddala, G. S. (2001), Introduction to Econometrics, 3rd edition, Prentice Hall, Englewood Cliffs,
NJ.
Mehta, M. (1995), “Foreign Exchange Markets”, in A. N. Refenes (ed.), Neural Networks in the
Capital Markets, John Wiley, Chichester, pp. 176“198.
Pesaran, M. H. and B. Pesaran (1997), “Lessons in Logit and Probit Estimation”, in Interactive
Econometric Analysis Working with Micro¬t 4, Oxford University Press, Oxford, pp. 263“275.
Pindyck, R. S. and D. L. Rubinfeld (1998), Econometric Models and Economic Forecasts, 4th edi-
tion, McGraw-Hill, New York.
Previa (2001), Previa Version 1.5 User™s Guide, (www.elseware.fr/previa).
Refenes, A. N. and A. Zaidi (1993), “Managing Exchange Rate Prediction Strategies with Neural
Networks”, in P. J. G. Lisboa and M. J. Taylor (eds), Techniques and Applications of Neural
Networks, Ellis Horwood, Hemel Hempstead, pp. 109“116.
Shapiro, A. F. (2000), “A Hitchhiker™s Guide to the Techniques of Adaptive Nonlinear Models”,
Insurance, Mathematics and Economics, 26, 119“132.
Thomas, R. L. (1997), Modern Econometrics. An Introduction, Addison-Wesley, Harlow.
Tyree, E. W. and J. A. Long (1995), “Forecasting Currency Exchange Rates: Neural
Networks and the Random Walk Model”, City University Working Paper, Proceedings
of the Third International Conference on Arti¬cial Intelligence Applications, New York,
(http://citeseer.nj.nec.com/131893.html).
Yao, J., H. Poh and T. Jasic (1996), “Foreign Exchange Rates Forecasting with Neural Networks”,
National University of Singapore Working Paper, Proceedings of the International Conference on
Neural Information Processing, Hong Kong, (http://citeseer.nj.com/yao96foreign.html).
Yao, J., Y. Li and C. L. Tan (1997), “Forecasting the Exchange Rates of CHF vs USD Using Neural
Networks”, Journal of Computational Intelligence in Finance, 15, 2, 7“13.
Zhang, G., B. E. Patuwo and M. Y. Hu (1998), “Forecasting with Arti¬cial Neural Networks: The
State of The Art”, International Journal of Forecasting, 14, 35“62.
2
Using Cointegration to Hedge and Trade
International Equities

A. NEIL BURGESS


ABSTRACT
In this chapter, we examine the application of the econometric concept of cointegration
as a tool for hedging and trading international equities. The concepts are illustrated with
respect to a particular set of data, namely the 50 equities which constituted the STOXX
50 index as of 4 July 2002. The daily closing prices of these equities are investigated
over a period from 14 September 1998 to 3 July 2002 “ the longest period over which
continuous data is available across the whole set of stocks in this particular universe. The
use of daily closing prices will introduce some spurious effects due to the non-synchronous
closing times of the markets on which these equities trade. In spite of this, however, the
data are deemed suitable for the purposes of illustrating the tools in question and also of
indicating the potential bene¬ts to be gained from intelligent application of these tools.
We consider cointegration as a framework for modelling the inter-relationships between
equities prices, in a manner which can be seen as a sophisticated form of “relative value”
analysis. Depending on the particular task in hand, cointegration techniques can be used
to identify potential hedges for a given equity position and/or to identify potential trades
which might be taken from a statistical arbitrage perspective.


2.1 INTRODUCTION
In this section we describe the econometric concept of “cointegration”, and explain our
motivation for developing trading tools based upon a cointegration perspective.
Cointegration is essentially an econometric tool for identifying situations where stable
relationships exist between a set of time series. In econometrics, cointegration testing is
typically seen as an end in itself, with the objective of testing an economic hypothesis
regarding the presence of an equilibrium relationship between a set of economic variables.
A possible second stage of cointegration modelling is to estimate the dynamics of the
mechanism by which short-term deviations from the equilibrium are corrected, i.e. to
construct an error-correction model (ECM).
The ¬rst aspect of cointegration modelling is interesting from the perspective of “hedg-
ing” assets against each other. The estimated equilibrium relationship will be one in which
the effect of common risk factors is neutralised or at least minimised, allowing low-risk

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m
±
™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5
42 Applied Quantitative Methods for Trading and Investment

combinations of assets to be created. The second aspect is interesting as a potential source
of statistical arbitrage strategies. Deviations from the long-term “fair price” relationship
can be considered as statistical “mispricings” and error-correction models can be used to
capture any predictable component in the tendency of these mispricings to revert towards
the longer term equilibrium.
Whilst the econometric methods used in cointegration modelling form the basis of our
approach, they involve a number of restrictive assumptions which limit the extent to which
they can be applied in practice. From our somewhat contrasting perspective, the use of
tools from cointegration modelling is seen as a “means to an end”, with the “end” being
the creation of successful trading strategies. In this chapter we explore the application of
cointegration-inspired tools to the task of trading and hedging international equities.
For both trading and hedging, the cointegration perspective can be viewed as an exten-

<<

. 2
( 19)



>>