Figure 1.9 (1,40) combination moving average Excel spreadsheet (in-sample)

where Yt is the dependent variable at time t; Yt’1 , Yt’2 , . . . , Yt’p are the lagged

dependent variables; φ0 , φ1 , . . . , φp are regression coef¬cients; µt is the residual term;

µt’1 , µt’2 , . . . , µt’p are previous values of the residual; w1 , w2 , . . . , wq are weights.

Several ARMA speci¬cations were tried out, for example ARMA(5,5) and

ARMA(10,10) models were produced to test for any “weekly” effects, which can be

reviewed in the arma.wf1 EViews work¬le. The ARMA(10,10) model was estimated but

was unsatisfactory as several coef¬cients were not even signi¬cant at the 90% con¬dence

interval (equation arma1010). The results of this are presented in Table 1.5. The model

was primarily modi¬ed through testing the signi¬cance of variables via the likelihood

ratio (LR) test for redundant or omitted variables and Ramsey™s RESET test for model

misspeci¬cation.

Once the non-signi¬cant terms are removed all of the coef¬cients of the restricted

ARMA(10,10) model become signi¬cant at the 99% con¬dence interval (equation

arma13610). The overall signi¬cance of the model is tested using the F -test. The null

hypothesis that all coef¬cients except the constant are not signi¬cantly different from zero

is rejected at the 99% con¬dence interval. The results of this are presented in Table 1.6.

Examination of the autocorrelation function of the error terms reveals that the residuals

are random at the 99% con¬dence interval and a further con¬rmation is given by the serial

correlation LM test. The results of this are presented in Tables 1.7 and 1.8. The model

is also tested for general misspeci¬cation via Ramsey™s RESET test. The null hypothesis

of correct speci¬cation is accepted at the 99% con¬dence interval. The results of this are

presented in Table 1.9.

Applications of Advanced Regression Analysis 15

Table 1.5 ARMA(10,10) EUR/USD returns estimation

Dependent Variable: DR’ USEURSP

Method: Least Squares

Sample(adjusted): 12 1459

Included observations: 1448 after adjusting endpoints

Convergence achieved after 20 iterations

White Heteroskedasticity“Consistent Standard Errors & Covariance

Backcast: 2 11

t-Statistic

Variable Coef¬cient Std. error Prob.

’0.000220 ’1.565764

C 0.000140 0.1176

’0.042510 ’0.853645

AR(1) 0.049798 0.3934

’0.210934 ’2.212073

AR(2) 0.095356 0.0271

’0.359378 ’5.820806

AR(3) 0.061740 0.0000

’0.041003 ’0.516264

AR(4) 0.079423 0.6058

AR(5) 0.001376 0.067652 0.020338 0.9838

AR(6) 0.132413 0.054071 2.448866 0.0145

’0.238913 ’4.542616

AR(7) 0.052594 0.0000

AR(8) 0.182816 0.046878 3.899801 0.0001

AR(9) 0.026431 0.060321 0.438169 0.6613

’0.615601 ’8.081867

AR(10) 0.076171 0.0000

MA(1) 0.037787 0.040142 0.941343 0.3467

MA(2) 0.227952 0.095346 2.390785 0.0169

MA(3) 0.341293 0.058345 5.849551 0.0000

MA(4) 0.036997 0.074796 0.494633 0.6209

’0.004544 ’0.076834

MA(5) 0.059140 0.9388

’0.140714 ’3.010598

MA(6) 0.046739 0.0027

MA(7) 0.253016 0.042340 5.975838 0.0000

’0.206445 ’5.151153

MA(8) 0.040077 0.0000

’0.014011 ’0.291661

MA(9) 0.048037 0.7706

MA(10) 0.643684 0.074271 8.666665 0.0000

’0.000225

R-squared 0.016351 Mean dependent var.

Adjusted R-squared 0.002565 S.D. dependent var. 0.005363

’7.606665

S.E. of regression 0.005356 Akaike info. criterion

’7.530121

Sum squared resid. 0.040942 Schwarz criterion

F -statistic

Log likelihood 5528.226 1.186064

Durbin“Watson stat. 1.974747 Prob(F -statistic) 0.256910

0.84 + 0.31i 0.84 ’ 0.31i 0.55 ’ 0.82i 0.55 + 0.82i

Inverted AR roots

0.07 + 0.98i 0.07 ’ 0.98i ’0.59 ’ 0.78i ’0.59 + 0.78i

’0.90 + 0.21i ’0.90 ’ 0.21i

0.85 + 0.31i 0.85 ’ 0.31i 0.55 ’ 0.82i 0.55 + 0.82i

Inverted MA roots

0.07 ’ 0.99i 0.07 + 0.99i ’0.59 ’ 0.79i ’0.59 + 0.79i

’0.90 + 0.20i ’0.90 ’ 0.20i

16 Applied Quantitative Methods for Trading and Investment

Table 1.6 Restricted ARMA(10,10) EUR/USD returns estimation

Dependent Variable: DR’ USEURSP

Method: Least Squares

Sample(adjusted): 12 1459

Included observations: 1448 after adjusting endpoints

Convergence achieved after 50 iterations

White Heteroskedasticity“Consistent Standard Errors & Covariance

Backcast: 2 11

t-Statistic

Variable Coef¬cient Std. error Prob.

’0.000221 ’1.531755

C 0.000144 0.1258

AR(1) 0.263934 0.049312 5.352331 0.0000

’0.444082 ’10.90827

AR(3) 0.040711 0.0000

’0.334221 ’9.410267

AR(6) 0.035517 0.0000

’0.636137 ’14.70664

AR(10) 0.043255 0.0000

’0.247033 ’5.361213

MA(1) 0.046078 0.0000

MA(3) 0.428264 0.030768 13.91921 0.0000

MA(6) 0.353457 0.028224 12.52307 0.0000

MA(10) 0.675965 0.041063 16.46159 0.0000

’0.000225

R-squared 0.015268 Mean dependent var.

Adjusted R-squared 0.009793 S.D. dependent var. 0.005363

’7.622139

S.E. of regression 0.005337 Akaike info. criterion

’7.589334

Sum squared resid. 0.040987 Schwarz criterion

F -statistic

Log likelihood 5527.429 2.788872

Durbin“Watson stat. 2.019754 Prob(F -statistic) 0.004583

0.89 + 0.37i 0.89 ’ 0.37i 0.61 + 0.78i 0.61 ’ 0.78i

Inverted AR roots

0.08 ’ 0.98i 0.08 + 0.98i ’0.53 ’ 0.70i ’0.53 + 0.70i

’0.92 + 0.31i ’0.92 ’ 0.31i

0.90 ’ 0.37i 0.90 + 0.37i 0.61 + 0.78i 0.61 ’ 0.78i

Inverted MA roots

0.07 + 0.99i 0.07 ’ 0.99i ’0.54 ’ 0.70i ’0.54 + 0.70i

’0.93 + 0.31i ’0.93 ’ 0.31i

The selected ARMA model, namely the restricted ARMA(10,10) model, takes the form:

Yt = ’0.0002 + 0.2639Yt’1 ’ 0.4440Yt’3 ’ 0.3342Yt’6 ’ 0.6361Yt’10

’ 0.2470µt’1 + 0.4283µt’3 + 0.3535µt’6 + 0.6760µt’10

The restricted ARMA(10,10) model was retained for out-of-sample estimation. The per-

formance of the strategy is evaluated in terms of traditional forecasting accuracy and

in terms of trading performance. Several other models were produced and their per-

formance evaluated, for example an alternative restricted ARMA(10,10) model was pro-

duced (equation arma16710). The decision to retain the original restricted ARMA(10,10)

model is because it has signi¬cantly better in-sample trading results than the alternative

ARMA(10,10) model. The annualised return, Sharpe ratio and correct directional change

of the original model were 12.65%, 1.49 and 53.80%, respectively. The corresponding

Applications of Advanced Regression Analysis 17

Table 1.7 Restricted ARMA(10,10) correlogram of residuals

Sample: 12 1459

Included observations: 1448

Q-statistic probabilities adjusted for 8 ARMA term(s)

Q-Stat.

Autocorrelation Partial correlation Prob.

’0.010 ’0.010

1 0.1509

’0.004 ’0.004

2 0.1777

3 0.004 0.004 0.1973

’0.001 ’0.001

4 0.1990

5 0.000 0.000 0.1991

’0.019 ’0.019

6 0.7099

’0.004 ’0.004

7 0.7284

’0.015 ’0.015

8 1.0573

9 0.000 0.000 1.0573 0.304

10 0.009 0.009 1.1824 0.554

11 0.031 0.032 2.6122 0.455

’0.024 ’0.024

12 3.4600 0.484

13 0.019 0.018 3.9761 0.553

’0.028 ’0.028

14 5.0897 0.532

15 0.008 0.008 5.1808 0.638

values for the alternative model were 9.47%, 1.11 and 52.35%. The evaluation can be

reviewed in Sheet 2 of the is arma13610.xls and is arma16710.xls Excel spreadsheets,

and is also presented in Figures 1.10 and 1.11, respectively. Ultimately, we chose the

model that satis¬ed the usual statistical tests and that also recorded the best in-sample

trading performance.

1.4.4 Logit estimation

The logit model belongs to a group of models termed “classi¬cation models”. They

are a multivariate statistical technique used to estimate the probability of an upward or

downward movement in a variable. As a result they are well suited to rates of return

applications where a recommendation for trading is required. For a full discussion of the

procedure refer to Maddala (2001), Pesaran and Pesaran (1997), or Thomas (1997).

The approach assumes the following regression model:

Yt— = β0 + β1 X1,t + β2 X2,t + · · · + βp Xp,t + µt (1.6)

where Yt— is the dependent variable at time t; X1,t , X2,t , . . . , Xp,t are the explanatory

variables at time t; β0 , β1 , . . . , βp are the regression coef¬cients; µt is the residual term.

However, Yt— is not directly observed; what is observed is a dummy variable Yt

de¬ned by:

if Yt— > 0

Yt = 1 (1.7)

0 otherwise

Therefore, the model requires a transformation of the explained variable, namely the

EUR/USD returns series into a binary series. The procedure is quite simple: a binary

18 Applied Quantitative Methods for Trading and Investment

Table 1.8 Restricted ARMA(10,10) serial correlation LM test

Breusch“Godfrey Serial Correlation LM Test

F -statistic 0.582234 Probability 0.558781

Obs*R-squared 1.172430 Probability 0.556429

Dependent Variable: RESID

Method: Least Squares

Presample missing value lagged residuals set to zero

t-Statistic

Variable Coef¬cient Std. error Prob.

C 8.33E-07 0.000144 0.005776 0.9954

AR(1) 0.000600 0.040612 0.014773 0.9882

AR(3) 0.019545 0.035886 0.544639 0.5861

AR(6) 0.018085 0.031876 0.567366 0.5706

’0.028997 ’0.774561

AR(10) 0.037436 0.4387

’0.000884 ’0.023012

MA(1) 0.038411 0.9816

’0.015096 ’0.568839

MA(3) 0.026538 0.5696

’0.014584 ’0.559792

MA(6) 0.026053 0.5757

MA(10) 0.029482 0.035369 0.833563 0.4047

’0.010425 ’0.334276

RESID(’1) 0.031188 0.7382

’0.004640 ’0.173111

RESID(’2) 0.026803 0.8626

R-squared 0.000810 Mean dependent var. 1.42E-07

’0.006144

Adjusted R-squared S.D. dependent var. 0.005322

’7.620186

S.E. of regression 0.005338 Akaike info. criterion

’7.580092

Sum squared resid. 0.040953 Schwarz criterion

F -statistic

Log likelihood 5528.015 0.116447

Durbin“Watson stat. 1.998650 Prob(F -statistic) 0.999652

Table 1.9 Restricted ARMA(10,10) RESET test for model misspeci¬cation

Ramsey RESET Test

F -statistic 0.785468 Probability 0.375622

Log likelihood ratio 0.790715 Probability 0.373884

variable equal to one is produced if the return is positive, and zero otherwise. The same

transformation for the explanatory variables, although not necessary, was performed for

homogeneity reasons.

A basic regression technique is used to produce the logit model. The idea is to start with

a model containing several variables, including lagged dependent terms, then through a

series of tests the model is modi¬ed.

The selected logit model, which we shall name logit1 (equation logit1 of the logit.wf1

EViews work¬le), takes the form:

Applications of Advanced Regression Analysis 19

Figure 1.10 Restricted ARMA(10,10) model Excel spreadsheet (in-sample)

Yt— = 0.2492 ’ 0.3613X1,t ’ 0.2872X2,t + 0.2862X3,t + 0.2525X4,t

’ 0.3692X5,t ’ 0.3937X6,t + µt

where X1,t , . . . , X6,t are the JP yc(’2), UK yc(’9), JAPDOWA(’1), ITMIB30(’19),

JAPAYE$(’10), and OILBREN(’1) binary explanatory variables, respectively.9

All of the coef¬cients in the model are signi¬cant at the 98% con¬dence interval. The

overall signi¬cance of the model is tested using the LR test. The null hypothesis that all

coef¬cients except the constant are not signi¬cantly different from zero is rejected at the

99% con¬dence interval. The results of this are presented in Table 1.10.

To justify the use of Japanese variables, which seems dif¬cult from an economic per-

spective, the joint overall signi¬cance of this subset of variables is tested using the LR test

for redundant variables. The null hypothesis that these coef¬cients, except the constant,

are not jointly signi¬cantly different from zero is rejected at the 99% con¬dence interval.

The results of this are presented in Table 1.11. In addition, a model that did not include

the Japanese variables, but was otherwise identical to logit1, was produced and the trad-

ing performance evaluated, which we shall name nojap (equation nojap of the logit.wf1

EViews work¬le). The Sharpe ratio, average gain/loss ratio and correct directional change

of the nojap model were 1.34, 1.01 and 54.38%, respectively. The corresponding values

for the logit1 model were 2.26, 1.01 and 58.13%. The evaluation can be reviewed in

Sheet 2 of the is logit1.xls and is nojap.xls Excel spreadsheets, and is also presented in

Figures 1.12 and 1.13, respectively.

9

Datastream mnemonics as mentioned in Table 1.1, yield curves and lags in brackets are used to save space.

20 Applied Quantitative Methods for Trading and Investment

Figure 1.11 Alternative restricted ARMA(10,10) model Excel spreadsheet (in-sample)

The logit1 model was retained for out-of-sample estimation. As, in practice, the estima-

tion of the model is based upon the cumulative distribution of the logistic function for the

error term, the forecasts produced range between zero and one, requiring transformation

into a binary series. Again, the procedure is quite simple: a binary variable equal to one

is produced if the forecast is greater than 0.5 and zero otherwise.

The performance of the strategy is evaluated in terms of forecast accuracy via the

correct directional change measure and in terms of trading performance. Several other

adequate models were produced and their performance evaluated. None performed better

in-sample, therefore the logit1 model was retained.

1.5 NEURAL NETWORK MODELS: THEORY

AND METHODOLOGY

Neural networks are “data-driven self-adaptive methods in that there are few a priori

assumptions about the models under study” (Zhang et al., 1998: 35). As a result, they are

well suited to problems where economic theory is of little use. In addition, neural networks

are universal approximators capable of approximating any continuous function (Hornik

et al., 1989).

Many researchers are confronted with problems where important nonlinearities

exist between the independent variables and the dependent variable. Often, in

such circumstances, traditional forecasting methods lack explanatory power. Recently,

nonlinear models have attempted to cover this shortfall. In particular, NNR models

have been applied with increasing success to ¬nancial markets, which often contain

nonlinearities (Dunis and Jalilov, 2002).

Applications of Advanced Regression Analysis 21

Table 1.10 Logit1 EUR/USD returns estimation

Dependent Variable: BDR’ USEURSP

Method: ML “ Binary Logit

Sample(adjusted): 20 1459

Included observations: 1440 after adjusting endpoints

Convergence achieved after 3 iterations

Covariance matrix computed using second derivatives

z-Statistic

Variable Coef¬cient Std. error Prob.

C 0.249231 0.140579 1.772894 0.0762

’0.361289 ’3.317273

BDR’ JP’ YC(’2) 0.108911 0.0009

’0.287220 ’2.649696

BDR’ UK’ YC(’9) 0.108397 0.0081

BDR’ JAPDOWA(’1) 0.286214 0.108687 2.633369 0.0085

BDR’ ITMIB31(’19) 0.252454 0.108056 2.336325 0.0195

’0.369227 ’3.408025

BDR’ JAPAYE$(’10) 0.108341 0.0007

’0.393689 ’3.629261

BDR’ OILBREN(’1) 0.108476 0.0003

Mean dependent var. 0.457639 S.D. dependent var. 0.498375

S.E. of regression 0.490514 Akaike info. criterion 1.353305

Sum squared resid. 344.7857 Schwarz criterion 1.378935

’967.3795

Log likelihood Hannan“Quinn criterion 1.362872

’992.9577 ’0.671791

Restr. log likelihood Avg. log likelihood

McFadden R-squared

LR statistic (6 df) 51.15635 0.025760

Prob(LR statistic) 2.76E-09

Obs. with dep = 0 781 Total obs. 1440

Obs. with dep = 1 659

Theoretically, the advantage of NNR models over traditional forecasting methods is

because, as is often the case, the model best adapted to a particular problem cannot be

identi¬ed. It is then better to resort to a method that is a generalisation of many models,

than to rely on an a priori model (Dunis and Huang, 2002).

However, NNR models have been criticised and their widespread success has been hin-

dered because of their “black-box” nature, excessive training times, danger of over¬tting,

and the large number of “parameters” required for training. As a result, deciding on the

appropriate network involves much trial and error.

For a full discussion on neural networks, please refer to Haykin (1999), Kaastra and

Boyd (1996), Kingdon (1997), or Zhang et al. (1998). Notwithstanding, we provide below

a brief description of NNR models and procedures.

1.5.1 Neural network models

The will to understand the functioning of the brain is the basis for the study of neural

networks. Mathematical modelling started in the 1940s with the work of McCulloch and

Pitts, whose research was based on the study of networks composed of a number of simple

interconnected processing elements called neurons or nodes. If the description is correct,

22 Applied Quantitative Methods for Trading and Investment

Table 1.11 Logit1 estimation redundant variables LR test

Redundant Variables: BDR’ JP’ YC(’2), BDR’ JAPDOWA(’1), BDR’ JAPAYE$(’10)

F -statistic 9.722023 Probability 0.000002

Log likelihood ratio 28.52168 Probability 0.000003

Test Equation:

Dependent Variable: BDR’ USEURSP

Method: ML “ Binary Logit

Sample: 20 1459

Included observations: 1440

Convergence achieved after 3 iterations

Covariance matrix computed using second derivatives

z-Statistic

Variable Coef¬cient Std. error Prob.

’0.013577 ’0.128959

C 0.105280 0.8974

’0.247254 ’2.311245

BDR’ UK’ YC(’9) 0.106979 0.0208

BDR’ ITMIB31(’19) 0.254096 0.106725 2.380861 0.0173

’0.345654 ’3.237047

BDR’ OILBREN(’1) 0.106781 0.0012

Mean dependent var. 0.457639 S.D. dependent var. 0.498375

S.E. of regression 0.494963 Akaike info. criterion 1.368945

Sum squared resid. 351.8032 Schwarz criterion 1.383590

’981.6403

Log likelihood Hannan“Quinn criterion 1.374412

’992.9577 ’0.681695

Restr. log likelihood Avg. log likelihood

McFadden R-squared

LR statistic (3 df) 22.63467 0.011398

Prob(LR statistic) 4.81E-05

Obs. with dep = 0 781 Total obs. 1440

Obs. with dep = 1 659

they can be turned into models mimicking some of the brain™s functions, possibly with

the ability to learn from examples and then to generalise on unseen examples.

A neural network is typically organised into several layers of elementary processing

units or nodes. The ¬rst layer is the input layer, the number of nodes corresponding

to the number of variables, and the last layer is the output layer, the number of nodes

corresponding to the forecasting horizon for a forecasting problem.10 The input and output

layer can be separated by one or more hidden layers, with each layer containing one or

more hidden nodes.11 The nodes in adjacent layers are fully connected. Each neuron

receives information from the preceding layer and transmits to the following layer only.12

The neuron performs a weighted summation of its inputs; if the sum passes a threshold

the neuron transmits, otherwise it remains inactive. In addition, a bias neuron may be

connected to each neuron in the hidden and output layers. The bias has a value of positive

10

Linear regression models may be viewed analogously to neural networks with no hidden layers (Kaastra and

Boyd, 1996).

11

Networks with hidden layers are multilayer networks; a multilayer perceptron network is used for this chapter.

12

If the ¬‚ow of information through the network is from the input to the output, it is known as “feedforward”.

Applications of Advanced Regression Analysis 23

Figure 1.12 Logit1 estimation Excel spreadsheet (in-sample)

Figure 1.13 Nojap estimation Excel spreadsheet (in-sample)

24 Applied Quantitative Methods for Trading and Investment

xt[1]

xt[2] Σ « ht[1]

˜

Σ « yt

xt[3]

Σ « ht[2]

yt

xt[4]

xt[5]

where xt [i ] (i = 1, 2, ..., 5) are the NNR model inputs at time t

ht [j ] ( j = 1, 2) are the hidden nodes outputs

∼

yt and yt are the actual value and NNR model output, respectively

Figure 1.14 A single output fully connected NNR model

one and is analogous to the intercept in traditional regression models. An example of

a fully connected NNR model with one hidden layer and two nodes is presented in

Figure 1.14.

The vector A = (x [1] , x [2] , . . . , x [n] ) represents the input to the NNR model where xt[i] is

the level of activity of the ith input. Associated with the input vector is a series of weight

vectors Wj = (w1j , w2j , . . . , wnj ) so that wij represents the strength of the connection

between the input xt[i] and the processing unit bj . There may also be the input bias •j

modulated by the weight w0j associated with the inputs. The total input of the node bj

is the dot product between vectors A and Wj , less the weighted bias. It is then passed

through a nonlinear activation function to produce the output value of processing unit bj :

n

bj = f x [i] wij ’ w0j •j = f (Xj ) (1.8)

i=1

Typically, the activation function takes the form of the logistic function, which introduces

a degree of nonlinearity to the model and prevents outputs from reaching very large

values that can “paralyse” NNR models and inhibit training (Kaastra and Boyd, 1996;

Zhang et al., 1998). Here we use the logistic function:

1

f (Xj ) = (1.9)

1 + e’Xj

The modelling process begins by assigning random values to the weights. The output

value of the processing unit is passed on to the output layer. If the output is optimal,

the process is halted, if not, the weights are adjusted and the process continues until an

optimal solution is found. The output error, namely the difference between the actual

value and the NNR model output, is the optimisation criterion. Commonly, the criterion

Applications of Advanced Regression Analysis 25

is the root-mean-squared error (RMSE). The RMSE is systematically minimised through

the adjustment of the weights. Basically, training is the process of determining the optimal

solutions network weights, as they represent the knowledge learned by the network. Since

inadequacies in the output are fed back through the network to adjust the network weights,

the NNR model is trained by backpropagation13 (Shapiro, 2000).

A common practice is to divide the time series into three sets called the training, test and

validation (out-of-sample) sets, and to partition them as roughly 2 , 1 and 1 , respectively.

36 6

The testing set is used to evaluate the generalisation ability of the network. The technique

consists of tracking the error on the training and test sets. Typically, the error on the

training set continually decreases, however the test set error starts by decreasing and

then begins to increase. From this point the network has stopped learning the similarities

between the training and test sets, and has started to learn meaningless differences, namely

the noise within the training data. For good generalisation ability, training should stop

when the test set error reaches its lowest point. The stopping rule reduces the likelihood

of over¬tting, i.e. that the network will become overtrained (Dunis and Huang, 2002;

Mehta, 1995).

An evaluation of the performance of the trained network is made on new examples not

used in network selection, namely the validation set. Crucially, the validation set should

never be used to discriminate between networks, as any set that is used to choose the

best network is, by de¬nition, a test set. In addition, good generalisation ability requires

that the training and test sets are representative of the population, inappropriate selection

will affect the network generalisation ability and forecast performance (Kaastra and Boyd,

1996; Zhang et al., 1998).

1.5.2 Issues in neural network modelling

Despite the satisfactory features of NNR models, the process of building them should not

be taken lightly. There are many issues that can affect the network™s performance and

should be considered carefully.

The issue of ¬nding the most parsimonious model is always a problem for statistical

methods and particularly important for NNR models because of the problem of over¬tting.

Parsimonious models not only have the recognition ability but also the more important

generalisation ability. Over¬tting and generalisation are always going to be a problem

for real-world situations, and this is particularly true for ¬nancial applications where time

series may well be quasi-random, or at least contain noise.

One of the most commonly used heuristics to ensure good generalisation is the applica-

tion of some form of Occam™s Razor. The principle states, “unnecessary complex models

should not be preferred to simpler ones. However . . . more complex models always ¬t

the data better” (Kingdon, 1997: 49). The two objectives are, of course, contradictory.

The solution is to ¬nd a model with the smallest possible complexity, and yet which can

still describe the data set (Haykin, 1999; Kingdon, 1997).

A reasonable strategy in designing NNR models is to start with one layer containing a

few hidden nodes, and increase the complexity while monitoring the generalisation ability.

The issue of determining the optimal number of layers and hidden nodes is a crucial factor

13

Backpropagation networks are the most common multilayer network and are the most used type in ¬nancial

time series forecasting (Kaastra and Boyd, 1996). We use them exclusively here.

26 Applied Quantitative Methods for Trading and Investment

for good network design, as the hidden nodes provide the ability to generalise. However,

in most situations there is no way to determine the best number of hidden nodes without

training several networks. Several rules of thumb have been proposed to aid the process,

however none work well for all applications. Notwithstanding, simplicity must be the

aim (Mehta, 1995).

Since NNR models are pattern matchers, the representation of data is critical for a

successful network design. The raw data for the input and output variables are rarely fed

into the network, they are generally scaled between the upper and lower bounds of the

activation function. For the logistic function the range is [0,1], avoiding the function™s

saturation zones. Practically, as here, a normalisation [0.2,0.8] is often used with the

logistic function, as its limits are only reached for in¬nite input values (Zhang et al.,

1998).

Crucial for backpropagation learning is the learning rate of the network as it determines

the size of the weight changes. Smaller learning rates slow the learning process, while

larger rates cause the error function to change wildly without continuously improving.

To improve the process a momentum parameter is used which allows for larger learning

rates. The parameter determines how past weight changes affect current weight changes,

by making the next weight change in approximately the same direction as the previous

one14 (Kaastra and Boyd, 1996; Zhang et al., 1998).

1.5.3 Neural network modelling procedure

Conforming to standard heuristics, the training, test and validation sets were partitioned

as approximately 2 , 1 and 1 , respectively. The training set runs from 17 October 1994

36 6

to 8 April 1999 (1169 observations), the test set runs from 9 April 1999 to 18 May

2000 (290 observations), and the validation set runs from 19 May 2000 to 3 July 2001

(290 observations), reserved for out-of-sample forecasting and evaluation, identical to the

out-of-sample period for the benchmark models.

To start, traditional linear cross-correlation analysis helped establish the existence of

a relationship between EUR/USD returns and potential explanatory variables. Although

NNR models attempt to map nonlinearities, linear cross-correlation analysis can give

some indication of which variables to include in a model, or at least a starting point to

the analysis (Diekmann and Gutjahr, 1998; Dunis and Huang, 2002).

The analysis was performed for all potential explanatory variables. Lagged terms

that were most signi¬cant as determined via the cross-correlation analysis are presented

in Table 1.12.

The lagged terms SPCOMP(’1) and US yc(’1) could not be used because of time-zone

differences between London and the USA, as discussed at the beginning of Section 1.3.

As an initial substitute SPCOMP(’2) and US yc(’2) were used. In addition, various

lagged terms of the EUR/USD returns were included as explanatory variables.

Variable selection was achieved via a forward stepwise NNR procedure, namely poten-

tial explanatory variables were progressively added to the network. If adding a new

variable improved the level of explained variance (EV) over the previous “best” network,

the pool of explanatory variables was updated.15 Since the aim of the model-building

14

The problem of convergence did not occur within this research; as a result, a learning rate of 0.1 and

momentum of zero were used exclusively.

15

EV is an approximation of the coef¬cient of determination, R 2 , in traditional regression techniques.

Applications of Advanced Regression Analysis 27

Table 1.12 Most signi¬cant lag

of each potential explanatory

variable (in returns)

Variable Best lag

DAXINDX 10

DJES50I 10

FRCAC40 10

FTSE100 5

GOLDBLN 19

ITMIB 9

JAPAYE$ 10

OILBREN 1

JAPDOWA 15

SPCOMP 1

USDOLLR 12

BD yc 19

EC yc 2

FR yc 9

IT yc 2

JP yc 6

UK yc 19

US yc 1

NYFECRB 20

procedure is to build a model with good generalisation ability, a model that has a higher

EV level has a better ability. In addition, a good measure of this ability is to compare

the EV level of the test and validation sets: if the test set and validation set levels are

similar, the model has been built to generalise well.

The decision to use explained variance is because the EUR/USD returns series is a

stationary series and stationarity remains important if NNR models are assessed on the

level of explained variance (Dunis and Huang, 2002). The EV levels for the training,

test and validation sets of the selected NNR model, which we shall name nnr1 (nnr1.prv

Previa ¬le), are presented in Table 1.13.

An EV level equal to, or greater than, 80% was used as the NNR learning termination

criterion. In addition, if the NNR model did not reach this level within 1500 learning

sweeps, again the learning terminates. The criteria selected are reasonable for daily data

and were used exclusively here.

If after several attempts there was failure to improve on the previous “best” model,

variables in the model were alternated in an attempt to ¬nd a better combination. This

Table 1.13 nnr1 model EV for the training,

test and validation sets

Training set Test set Validation set

3.4% 2.3% 2.2%

28 Applied Quantitative Methods for Trading and Investment

procedure recognises the likelihood that some variables may only be relevant predictors

when in combination with certain other variables.

Once a tentative model is selected, post-training weights analysis helps establish the

importance of the explanatory variables, as there are no standard statistical tests for NNR

models. The idea is to ¬nd a measure of the contribution a given weight has to the

overall output of the network, in essence allowing detection of insigni¬cant variables.

Such analysis includes an examination of a Hinton graph, which represents graphically

the weight matrix within the network. The principle is to include in the network variables

that are strongly signi¬cant. In addition, a small bias weight is preferred (Diekmann and

Gutjahr, 1998; Kingdon, 1997; Previa, 2001). The input to a hidden layer Hinton graph

of the nnr1 model produced by Previa is presented in Figure 1.15. The graph suggests

that the explanatory variables of the selected model are strongly signi¬cant, both positive

(green) and negative (black), and that there is a small bias weight. In addition, the input

to hidden layer weight matrix of the nnr1 model produced by Previa is presented in

Table 1.14.

The nnr1 model contained the returns of the explanatory variables presented in

Table 1.15, having one hidden layer containing ¬ve hidden nodes.

Again, to justify the use of the Japanese variables a further model that did not include

these variables, but was otherwise identical to nnr1, was produced and the performance

evaluated, which we shall name nojap (nojap.prv Previa ¬le). The EV levels of the training

Figure 1.15 Hinton graph of the nnr1 EUR/USD returns model

Applications of Advanced Regression Analysis 29

Table 1.14 Input to hidden layer weight matrix of the nnr1 EUR/USD returns model

GOLD JAPAY JAP OIL US FR yc IT yc JP yc JAPAY JAP Bias

BLN E$ DOWA BREN DOLLR (’2) (’6) (’9) E$ DOWA

(’19) (’10) (’15) (’1) (’12) (’1) (’1)

’0.2120 ’0.4336 ’0.4579 ’0.2621 ’0.3911 ’0.0824

C[1,0] 0.2316 0.2408 0.4295 0.4067 0.4403

’0.1752 ’0.3589 ’0.5474 ’0.3663 ’0.4623 ’0.0225

C[1,1] 0.4016 0.2438 0.2786 0.2757 0.4831

’0.3037 ’0.4462 ’0.5139 ’0.2506 ’0.3491 ’0.0088

C[1,2] 0.2490 0.2900 0.3634 0.2737 0.4132

’0.3588 ’0.4089 ’0.5446 ’0.2730 ’0.4531

C[1,3] 0.3382 0.2555 0.4661 0.4153 0.5245 0.0373

’0.3283 ’0.4086 ’0.6108 ’0.2362 ’0.4828 ’0.0447

C[1,4] 0.3338 0.3088 0.4192 0.4254 0.4779

Table 1.15 nnr1 model explana-

tory variables (in returns)

Variable Lag

GOLDBLN 19

JAPAYE$ 10

JAPDOWA 15

OILBREN 1

USDOLLR 12

FR yc 2

IT yc 6

JP yc 9

JAPAYE$ 1

JAPDOWA 1

and test sets of the nojap model were 1.4 and 0.6 respectively, which are much lower

than the nnr1 model.

The nnr1 model was retained for out-of-sample estimation. The performance of the

strategy is evaluated in terms of traditional forecasting accuracy and in terms of trading

performance.

Several other adequate models were produced and their performance evaluated, includ-

ing RNN models.16 In essence, the only difference from NNR models is the addition of a

loop back from a hidden or the output layer to the input layer. The loop back is then used

as an input in the next period. There is no theoretical or empirical answer to whether the

hidden layer or the output should be looped back. However, the looping back of either

allows RNN models to keep the memory of the past,17 a useful property in forecasting

applications. This feature comes at a cost, as RNN models require more connections,

raising the issue of complexity. Since simplicity is the aim, a less complex model that

can still describe the data set is preferred.

The statistical forecasting accuracy results of the nnr1 model and the RNN model,

which we shall name rnn1 (rnn1.prv Previa ¬le), were only marginally different, namely

the mean absolute percentage error (MAPE) differs by 0.09%. However, in terms of

16

For a discussion on recurrent neural network models refer to Dunis and Huang (2002).

17

The looping back of the output layer is an error feedback mechanism, implying the use of a nonlinear

error-correction model (Dunis and Huang, 2002).

30 Applied Quantitative Methods for Trading and Investment

Figure 1.16 nnr1 model Excel spreadsheet (in-sample)

Figure 1.17 rnn1 model Excel spreadsheet (in-sample)

Applications of Advanced Regression Analysis 31

trading performance there is little to separate the nnr1 and rnn1 models. The evaluation

can be reviewed in Sheet 2 of the is nnr1.xls and is rnn1.xls Excel spreadsheets, and is

also presented in Figures 1.16 and 1.17, respectively.

The decision to retain the nnr1 model over the rnn1 model is because the rnn1 model is

more complex and yet does not possess any decisive added value over the simpler model.

1.6 FORECASTING ACCURACY AND TRADING SIMULATION

To compare the performance of the strategies, it is necessary to evaluate them on pre-

viously unseen data. This situation is likely to be the closest to a true forecasting or

trading situation. To achieve this, all models retained an identical out-of-sample period

allowing a direct comparison of their forecasting accuracy and trading performance.

1.6.1 Out-of-sample forecasting accuracy measures

Several criteria are used to make comparisons between the forecasting ability of the

benchmark and NNR models, including mean absolute error (MAE), RMSE,18 MAPE,

and Theil™s inequality coef¬cient (Theil-U).19 For a full discussion on these measures, refer

to Hanke and Reitsch (1998) and Pindyck and Rubinfeld (1998). We also include correct

directional change (CDC), which measures the capacity of a model to correctly predict the

subsequent actual change of a forecast variable, an important issue in a trading strategy

that relies on the direction of a forecast rather than its level. The statistical performance

measures used to analyse the forecasting techniques are presented in Table 1.16.

1.6.2 Out-of-sample trading performance measures

Statistical performance measures are often inappropriate for ¬nancial applications. Typi-

cally, modelling techniques are optimised using a mathematical criterion, but ultimately

the results are analysed on a ¬nancial criterion upon which it is not optimised. In other

words, the forecast error may have been minimised during model estimation, but the

evaluation of the true merit should be based on the performance of a trading strategy.

Without actual trading, the best means of evaluating performance is via a simulated trad-

ing strategy. The procedure to create the buy and sell signals is quite simple: a EUR/USD

buy signal is produced if the forecast is positive, and a sell otherwise.20

For many traders and analysts market direction is more important than the value of

the forecast itself, as in ¬nancial markets money can be made simply by knowing the

direction the series will move. In essence, “low forecast errors and trading pro¬ts are not

synonymous since a single large trade forecasted incorrectly . . . could have accounted for

most of the trading system™s pro¬ts” (Kaastra and Boyd, 1996: 229).

The trading performance measures used to analyse the forecasting techniques are pre-

sented in Tables 1.17 and 1.18. Most measures are self-explanatory and are commonly

used in the fund management industry. Some of the more important measures include

the Sharpe ratio, maximum drawdown and average gain/loss ratio. The Sharpe ratio is a

18

The MAE and RMSE statistics are scale-dependent measures but allow a comparison between the actual and

forecast values, the lower the values the better the forecasting accuracy.

19

When it is more important to evaluate the forecast errors independently of the scale of the variables, the

MAPE and Theil-U are used. They are constructed to lie within [0,1], zero indicating a perfect ¬t.

20

A buy signal is to buy euros at the current price or continue holding euros, while a sell signal is to sell euros

at the current price or continue holding US dollars.

32 Applied Quantitative Methods for Trading and Investment

Table 1.16 Statistical performance measures

Performance measure Description

T

1

MAE = |y t ’ y t |

˜

Mean absolute error (1.10)

T t=1

T

yt ’ yt

˜

100

MAPE =

Mean absolute percentage error (1.11)

T yt

t=1

T

1

RMSE = (yt ’ yt )2

˜

Root-mean-squared error (1.12)

T t=1

T

1

(yt ’ yt )2

˜

T t=1

U=

Theil™s inequality coef¬cient (1.13)

T T

1 1

(yt )2 +

˜ (yt )2

T T

t=1 t=1

N

100

CDC = Dt

Correct directional change (1.14)

N t=1

where Dt = 1 if yt · yt > 0 else Dt = 0

˜

yt is the actual change at time t.

˜

yt is the forecast change.

t = 1 to t = T for the forecast period.

risk-adjusted measure of return, with higher ratios preferred to those that are lower, the

maximum drawdown is a measure of downside risk and the average gain/loss ratio is

a measure of overall gain, a value above one being preferred (Dunis and Jalilov, 2002;

Fernandez-Rodriguez et al., 2000).

The application of these measures may be a better standard for determining the quality

of the forecasts. After all, the ¬nancial gain from a given strategy depends on trading

performance, not on forecast accuracy.

1.6.3 Out-of-sample forecasting accuracy results

The forecasting accuracy statistics do not provide very conclusive results. Each of the

models evaluated, except the logit model, are nominated “best” at least once. Interestingly,

the na¨ve model has the lowest Theil-U statistic at 0.6901; if this model is believed to be

±

the “best” model there is likely to be no added value using more complicated forecasting

techniques. The ARMA model has the lowest MAPE statistic at 101.51%, and equals

the MAE of the NNR model at 0.0056. The NNR model has the lowest RMSE statistic,

however the value is only marginally less than the ARMA model. The MACD model has

the highest CDC measure, predicting daily changes accurately 60.00% of the time. It is

dif¬cult to select a “best” performer from these results, however a majority decision rule

Applications of Advanced Regression Analysis 33

Table 1.17 Trading simulation performance measures

Performance measure Description

N

1

R = 252 —

A

Rt

Annualised return (1.15)

N t=1

N

R=C

RT

Cumulative return (1.16)

t=1

√ N

1

σ= 252 — (Rt ’ R)2

A

Annualised volatility (1.17)

N ’1 t=1

RA

SR =

Sharpe ratio (1.18)

σA

Maximum value of Rt over the period

Maximum daily pro¬t (1.19)

Minimum value of Rt over the period

Maximum daily loss (1.20)

(RT ) over the period

Maximum drawdown Maximum negative value of

MD = min Rtc ’ max Ric (1.21)

t=1,...,N i=1,...,t

N

Ft

t=1

WT = 100 —

% Winning trades (1.22)

NT

where Ft = 1 if transaction pro¬tt > 0

N

Gt

t=1

LT = 100 —

% Losing trades (1.23)

NT

where Gt = 1 if transaction pro¬tt < 0

Nup = number of Rt > 0

Number of up periods (1.24)

Ndown = number of Rt < 0

Number of down periods (1.25)

N

NT = Lt

Number of transactions (1.26)

t=1

where Lt = 1 if trading signalt = trading signalt’1

Number of all Rt ™s

Total trading days (1.27)

AG = (Sum of all Rt > 0)/Nup

Avg. gain in up periods (1.28)

AL = (Sum of all Rt < 0)/Ndown

Avg. loss in down periods (1.29)

GL = AG/AL

Avg. gain/loss ratio (1.30)

MaxRisk

(1 ’ P )

P oL =

P

(W T — AG) + (LT — AL)

where P = 0.5 — 1 +

Probability of 10% loss (1.31)

[(W T — AG2 ) + (LT — AL2 )]

= [(W T — AG2 ) + (LT — AL2 )]

and

MaxRisk is the risk level de¬ned by the user; this research, 10%

√ RA

T -statistics = N—

Pro¬ts T -statistics (1.32)

σA

Source: Dunis and Jalilov (2002).

34 Applied Quantitative Methods for Trading and Investment

Table 1.18 Trading simulation performance measures

Performance measure Description

N

NPR = Qt

Number of periods daily (1.33)

returns rise t=1

where Qt = 1 if yt > 0 else Qt = 0

N

NPF = St

Number of periods daily (1.34)

returns fall t=1

where St = 1 if yt < 0 else St = 0

N

NWU = Bt

Number of winning up (1.35)

periods t=1

where Bt = 1 if Rt > 0 and yt > 0 else Bt = 0

N

NWD = Et

Number of winning down (1.36)

periods t=1

where Et = 1 if Rt > 0 and yt < 0 else Et = 0

WUP = 100 — (NWU/NPR)

Winning up periods (%) (1.37)

WDP = 100 — (NWD/NPF)

Winning down periods (%) (1.38)

Table 1.19 Forecasting accuracy results21

Na¨ve

± MACD ARMA Logit NNR

Mean absolute error 0.0080 “ 0.0056 “ 0.0056

Mean absolute percentage error 317.31% “ 101.51% “ 107.38%

Root-mean-squared error 0.0102 “ 0.0074 “ 0.0073

Theil™s inequality coef¬cient 0.6901 “ 0.9045 “ 0.8788

Correct directional change 55.86% 60.00% 56.55% 53.79% 57.24%

might select the NNR model as the overall “best” model because it is nominated “best”

twice and also “second best” by the other three statistics. A comparison of the forecasting

accuracy results is presented in Table 1.19.

1.6.4 Out-of-sample trading performance results

A comparison of the trading performance results is presented in Table 1.20 and

Figure 1.18. The results of the NNR model are quite impressive. It generally outperforms

the benchmark strategies, both in terms of overall pro¬tability with an annualised return

of 29.68% and a cumulative return of 34.16%, and in terms of risk-adjusted performance

with a Sharpe ratio of 2.57. The logit model has the lowest downside risk as measured

by maximum drawdown at ’5.79%, and the MACD model has the lowest downside risk

21

As the MACD model is not based on forecasting the next period and binary variables are used in the logit

model, statistical accuracy comparisons with these models were not always possible.

Applications of Advanced Regression Analysis 35

Table 1.20 Trading performance results

Na¨ve

± MACD ARMA Logit NNR

Annualised return 21.34% 11.34% 12.91% 21.05% 29.68%

Cumulative return 24.56% 13.05% 14.85% 24.22% 34.16%

Annualised volatility 11.64% 11.69% 11.69% 11.64% 11.56%

Sharpe ratio 1.83 0.97 1.10 1.81 2.57

Maximum daily pro¬t 3.38% 1.84% 3.38% 1.88% 3.38%

’2.10% ’3.23% ’2.10% ’3.38% ’1.82%

Maximum daily loss

’9.06% ’7.75% ’10.10% ’5.79% ’9.12%

Maximum drawdown

% Winning trades 37.01% 24.00% 52.71% 49.65% 52.94%

% Losing trades 62.99% 76.00% 47.29% 50.35% 47.06%

Number of up periods 162 149 164 156 166

Number of down periods 126 138 124 132 122

Number of transactions 127 25 129 141 136

Total trading days 290 290 290 290 290

Avg. gain in up periods 0.58% 0.60% 0.55% 0.61% 0.60%

’0.56% ’0.55% ’0.61% ’0.53% ’0.54%

Avg. loss in down periods

Avg. gain/loss ratio 1.05 1.08 0.91 1.14 1.12

Probability of 10% loss 0.70% 0.02% 5.70% 0.76% 0.09%

Pro¬ts T -statistics 31.23 16.51 18.81 30.79 43.71

Number of periods daily returns rise 128 128 128 128 128

Number of periods daily returns fall 162 162 162 162 162

Number of winning up periods 65 45 56 49 52

Number of winning down periods 97 104 108 106 114

% Winning up periods 50.78% 35.16% 43.75% 38.28% 40.63%

% Winning down periods 59.88% 64.20% 66.67% 66.05% 70.37%

40%

35%

Naïve

30%

Cumulated profit

25% MACD

20% ARMA

15%

logit

10%

NNR

5%

0%

’5%

’10%

19/05/00 19/08/00 19/11/00 19/02/01 19/05/01

19 May 2000 to 3 July 2001

Figure 1.18 Cumulated pro¬t graph

as measured by the probability of a 10% loss at 0.02%, however this is only marginally

less than the NNR model at 0.09%.

The NNR model predicted the highest number of winning down periods at 114, while

the na¨ve model forecast the highest number of winning up periods at 65. Interestingly,

±

all models were more successful at forecasting a fall in the EUR/USD returns series, as

indicated by a greater percentage of winning down periods to winning up periods.

36 Applied Quantitative Methods for Trading and Investment

The logit model has the highest number of transactions at 141, while the NNR model

has the second highest at 136. The MACD strategy has the lowest number of transactions

at 25. In essence, the MACD strategy has longer “holding” periods compared to the

other models, suggesting that the MACD strategy is not compared “like with like” to the

other models.

More than with statistical performance measures, ¬nancial criteria clearly single out the

NNR model as the one with the most consistent performance. Therefore it is considered

the “best” model for this particular application.

1.6.5 Transaction costs

So far, our results have been presented without accounting for transaction costs during the

trading simulation. However, it is not realistic to account for the success or otherwise of

a trading system unless transaction costs are taken into account. Between market makers,

a cost of 3 pips (0.0003 EUR/USD) per trade (one way) for a tradable amount, typically

USD 5“10 million, would be normal. The procedure to approximate the transaction costs

for the NNR model is quite simple.

A cost of 3 pips per trade and an average out-of-sample EUR/USD of 0.8971 produce

an average cost of 0.033% per trade:

0.0003

= 0.033%

0.8971

The NNR model made 136 transactions. Since the EUR/USD time series is a series of

bid rates and because, apart from the ¬rst trade, each signal implies two transactions, one

to close the existing position and a second one to enter the new position indicated by the

model signal, the approximate out-of-sample transaction costs for the NNR model trading

strategy are about 4.55%:

136 — 0.033% = 4.55%

Therefore, even accounting for transaction costs, the extra returns achieved with the

NNR model still make this strategy the most attractive one despite its relatively high

trading frequency.

1.7 CONCLUDING REMARKS

This chapter has evaluated the use of different regression models in forecasting and trading

the EUR/USD exchange rate. The performance was measured statistically and ¬nancially

via a trading simulation taking into account the impact of transaction costs on models

with higher trading frequencies. The logic behind the trading simulation is, if pro¬t from

a trading simulation is compared solely on the basis of statistical measures, the optimum

model from a ¬nancial perspective would rarely be chosen.

The NNR model was benchmarked against more traditional regression-based and other

benchmark forecasting techniques to determine any added value to the forecasting process.

Having constructed a synthetic EUR/USD series for the period up to 4 January 1999, the

models were developed using the same in-sample data, 17 October 1994 to 18 May 2000,

leaving the remaining period, 19 May 2000 to 3 July 2001, for out-of-sample forecasting.

Applications of Advanced Regression Analysis 37

Forecasting techniques rely on the weaknesses of the ef¬cient market hypothesis,

acknowledging the existence of market inef¬ciencies, with markets displaying even weak

signs of predictability. However, FX markets are relatively ef¬cient, reducing the scope of

a pro¬table strategy. Consequently, the FX managed futures industry average Sharpe ratio

is only 0.8, although a percentage of winning trades greater than 60% is often required

to run a pro¬table FX trading desk (Grabbe, 1996 as cited in Bellgard and Goldschmidt,

1999: 10). In this respect, it is worth noting that only one of our models reached a 60%

winning trades accuracy, namely the MACD model at 60.00%. Nevertheless, all of the

models examined in this chapter achieved an out-of-sample Sharpe ratio higher than 0.8,

the highest of which was again the NNR model at 2.57. This seems to con¬rm that the

use of quantitative trading is more appropriate in a fund management than in a treasury

type of context.

Forecasting techniques are dependent on the quality and nature of the data used. If the

solution to a problem is not within the data, then no technique can extract it. In addition,

suf¬cient information should be contained within the in-sample period to be representative

of all cases within the out-of-sample period. For example, a downward trending series

typically has more falls represented in the data than rises. The EUR/USD is such a series

within the in-sample period. Consequently, the forecasting techniques used are estimated

using more negative values than positive values. The probable implication is that the

models are more likely to successfully forecast a fall in the EUR/USD, as indicated by

our results, with all models forecasting a higher percentage of winning down periods than

winning up periods. However, the na¨ve model does not learn to generalise per se, and

±

as a result has the smallest difference between the number of winning up to winning

down periods.

Overall our results con¬rm the credibility and potential of regression models and par-

ticularly NNR models as a forecasting technique. However, while NNR models offer a

promising alternative to more traditional techniques, they suffer from a number of limita-

tions. They are not the panacea. One of the major disadvantages is the inability to explain

their reasoning, which has led some to consider that “neural nets are truly black boxes.

Once you have trained a neural net and are generating predictions, you still do not know

why the decisions are being made and can™t ¬nd out by just looking at the net. It is not

unlike attempting to capture the structure of knowledge by dissecting the human brain”

(Fishman et al., 1991 as cited in El-Shazly and El-Shazly, 1997: 355). In essence, the neu-

ral network learning procedure is not very transparent, requiring a lot of understanding.

In addition, statistical inference techniques such as signi¬cance testing cannot always be

applied, resulting in a reliance on a heuristic approach. The complexity of NNR models

suggests that they are capable of superior forecasts, as shown in this chapter, however

this is not always the case. They are essentially nonlinear techniques and may be less ca-

pable in linear applications than traditional forecasting techniques (Balkin and Ord, 2000;

Campbell et al., 1997; Lisboa and Vellido, 2000; Refenes and Zaidi, 1993).

Although the results support the success of neural network models in ¬nancial appli-

cations, there is room for increased success. Such a possibility lies with optimising the

neural network model on a ¬nancial criterion, and not a mathematical criterion. As the

pro¬tability of a trading strategy relies on correctly forecasting the direction of change,

namely CDC, to optimise the neural network model on such a measure could improve

trading performance. However, backpropagation networks optimise by minimising a dif-

ferentiable function such as squared error, they cannot minimise a function based on loss,

38 Applied Quantitative Methods for Trading and Investment

or conversely, maximise a function based on pro¬t. Notwithstanding, there is possibility

to explore this idea further, provided the neural network software has the ability to select

such an optimisation criterion.

Future work might also include the addition of hourly data as a possible explanatory

variable. Alternatively, the use of ¬rst differences instead of rates of return series may be

investigated, as ¬rst differences are perhaps the most effective way to generate data sets

for neural network learning (Mehta, 1995).

Further investigation into RNN models is possible, or into combining forecasts. Many

researchers agree that individual forecasting methods are misspeci¬ed in some manner,

suggesting that combining multiple forecasts leads to increased forecast accuracy (Dunis

and Huang, 2002). However, initial investigations proved unsuccessful, with the NNR

model remaining the “best” model. Two simple model combinations were examined,

a simple averaging of the na¨ve, ARMA and NNR model forecasts, and a regression-

±

type combined forecast using the na¨ve, ARMA and NNR models.22 The regression-

±

type combined forecast follows the Granger and Ramanathan procedure (gr.wf1 EViews

work¬le). The evaluation can be reviewed in Sheet 2 of the oos gr.xls Excel spreadsheet,

and is also presented in Figure 1.19. The lack of success using the combination models

was undoubtedly because the performance of the benchmark models was so much weaker

than that of the NNR model. It is unlikely that combining relatively “poor” models with

an otherwise “good” one will outperform the “good” model alone.

The main conclusion that can be drawn from this chapter is that there are indeed

nonlinearities present within ¬nancial markets and that a neural network model can be

Figure 1.19 Regression-type combined forecast Excel spreadsheet (out-of-sample)

22

For a full discussion on the procedures, refer to Clemen (1989), Granger and Ramanathan (1984), and Hashem

(1997).

Applications of Advanced Regression Analysis 39

trained to recognise them. However, despite the limitations and potential improvements

mentioned above, our results strongly suggest that regression models and particularly

NNR models can add value to the forecasting process. For the EUR/USD exchange rate

and the period considered, NNR models clearly outperform the more traditional modelling

techniques analysed in this chapter.

REFERENCES

Balkin, S. D. and J. K. Ord (2000), “Automatic Neural Network Modelling for Univariate Time

Series”, International Journal of Forecasting, 16, 509“515.

Bellgard, C. and P. Goldschmidt (1999), “Forecasting Across Frequencies: Linearity and Non-

Linearity”, University of Western Australia Research Paper, Proceedings of the International

Conference on Advanced Technology, Australia, (www.imm.ecel.uwa.edu.au/∼cbellgar/).

Box, G. E. P., G. M. Jenkins and G. C. Reinsel (1994), Time Series Analysis: Forecasting and

Control, Prentice Hall, Englewood Cliffs, NJ.

Campbell, I. Y., A. W. Lo and A. C. MacKinley (1997), “Nonlinearities in Financial Data”, in The

Econometrics of Financial Markets, Princeton University Press, Princeton, NJ, pp. 512“524.

Carney, J. C. and P. Cunningham (1996), “Neural Networks and Currency Exchange

Rate Prediction”, Trinity College Working Paper, Foresight Business Journal web page,

(www.maths.tcd.ie/pub/fbj/forex4.html).

Clemen, R. T. (1989), “Combining Forecasts: A Review and Annotated Bibliography”, Inter-

national Journal of Forecasting, 5, 559“583.

Diekmann, A. and S. Gutjahr (1998), “Prediction of the Euro“Dollar Future Using Neural Net-

works “ A Case Study for Financial Time Series Prediction”, University of Karlsruhe Working

Paper, Proceedings of the International Symposium on Intelligent Data Engineering and Learning

(IDEAL™98), Hong Kong, (http://citeseer.nj.nec.com/diekmann98prediction.html).

Dunis, C. and X. Huang (2002), “Forecasting and Trading Currency Volatility: An Application of

Recurrent Neural Regression and Model Combination”, The Journal of Forecasting, 21, 317“354.

Dunis, C. and J. Jalilov (2002), “Neural Network Regression and Alternative Forecasting Tech-

niques for Predicting Financial Variables”, Neural Network World, 2, 113“139.

El-Shazly, M. R. and H. E. El-Shazly (1997), “Comparing the Forecasting Performance of Neural

Networks and Forward Exchange Rates”, Journal of Multinational Financial Management, 7,

345“356.

Fernandez-Rodriguez, F., C. Gonzalez-Martel and S. Sosvilla-Rivero (2000), “On the Pro¬tability

of Technical Trading Rules Based on Arti¬cial Neural Networks: Evidence from the Madrid

Stock Market”, Economics Letters, 69, 89“94.

Fishman, M. B., D. S. Barr and W. J. Loick (1991), “Using Neural Nets in Market Analysis”,

Technical Analysis of Stocks and Commodities, 9, 4, 135“138.

Gen¸ ay, R. (1999), “Linear, Non-linear and Essential Foreign Exchange Rate Prediction with Simple

c

Technical Trading Rules”, Journal of International Economics, 47, 91“107.

Gouri´ roux, C. and A. Monfort (1995), Time Series and Dynamic Models, translated and edited by

e

G. Gallo, Cambridge University Press, Cambridge.

Grabbe, J. O. (1996), International Financial Markets, 3rd edition, Prentice Hall, Englewood Cliffs,

NJ.

Granger, C. W. J. and R. Ramanathan (1984), “Improved Methods of Combining Forecasts”, Jour-

nal of Forecasting, 3, 197“204.

Hanke, J. E. and A. G. Reitsch (1998), Business Forecasting, 6th edition, Prentice Hall, Englewood

Cliffs, NJ.

Hashem, S. (1997), “Optimal Linear Combinations of Neural Networks”, Neural Networks, 10, 4,

599“614 (www.emsl.pnl.gov:2080/people/bionames/hashem s.html).

Haykin, S. (1999), Neural Networks: A Comprehensive Foundation, 2nd edition, Prentice Hall,

Englewood Cliffs, NJ.

Hornik, K., M. Stinchcombe and H. White (1989), “Multilayer Feedforward Networks Are Univer-

sal Approximators”, Neural Networks, 2, 359“366.

Kaastra, I. and M. Boyd (1996), “Designing a Neural Network for Forecasting Financial and

Economic Time Series”, Neurocomputing, 10, 215“236.

40 Applied Quantitative Methods for Trading and Investment

Kingdon, J. (1997), Intelligent Systems and Financial Forecasting, Springer, London.

Lisboa, P. J. G. and A. Vellido (2000), “Business Applications of Neural Networks”, in

P. J. G. Lisboa, B. Edisbury and A. Vellido (eds), Business Applications of Neural Networks:

The State-of-the-Art of Real-World Applications, World Scienti¬c, Singapore, pp. vii“xxii.

Maddala, G. S. (2001), Introduction to Econometrics, 3rd edition, Prentice Hall, Englewood Cliffs,

NJ.

Mehta, M. (1995), “Foreign Exchange Markets”, in A. N. Refenes (ed.), Neural Networks in the

Capital Markets, John Wiley, Chichester, pp. 176“198.

Pesaran, M. H. and B. Pesaran (1997), “Lessons in Logit and Probit Estimation”, in Interactive

Econometric Analysis Working with Micro¬t 4, Oxford University Press, Oxford, pp. 263“275.

Pindyck, R. S. and D. L. Rubinfeld (1998), Econometric Models and Economic Forecasts, 4th edi-

tion, McGraw-Hill, New York.

Previa (2001), Previa Version 1.5 User™s Guide, (www.elseware.fr/previa).

Refenes, A. N. and A. Zaidi (1993), “Managing Exchange Rate Prediction Strategies with Neural

Networks”, in P. J. G. Lisboa and M. J. Taylor (eds), Techniques and Applications of Neural

Networks, Ellis Horwood, Hemel Hempstead, pp. 109“116.

Shapiro, A. F. (2000), “A Hitchhiker™s Guide to the Techniques of Adaptive Nonlinear Models”,

Insurance, Mathematics and Economics, 26, 119“132.

Thomas, R. L. (1997), Modern Econometrics. An Introduction, Addison-Wesley, Harlow.

Tyree, E. W. and J. A. Long (1995), “Forecasting Currency Exchange Rates: Neural

Networks and the Random Walk Model”, City University Working Paper, Proceedings

of the Third International Conference on Arti¬cial Intelligence Applications, New York,

(http://citeseer.nj.nec.com/131893.html).

Yao, J., H. Poh and T. Jasic (1996), “Foreign Exchange Rates Forecasting with Neural Networks”,

National University of Singapore Working Paper, Proceedings of the International Conference on

Neural Information Processing, Hong Kong, (http://citeseer.nj.com/yao96foreign.html).

Yao, J., Y. Li and C. L. Tan (1997), “Forecasting the Exchange Rates of CHF vs USD Using Neural

Networks”, Journal of Computational Intelligence in Finance, 15, 2, 7“13.

Zhang, G., B. E. Patuwo and M. Y. Hu (1998), “Forecasting with Arti¬cial Neural Networks: The

State of The Art”, International Journal of Forecasting, 14, 35“62.

2

Using Cointegration to Hedge and Trade

International Equities

A. NEIL BURGESS

ABSTRACT

In this chapter, we examine the application of the econometric concept of cointegration

as a tool for hedging and trading international equities. The concepts are illustrated with

respect to a particular set of data, namely the 50 equities which constituted the STOXX

50 index as of 4 July 2002. The daily closing prices of these equities are investigated

over a period from 14 September 1998 to 3 July 2002 “ the longest period over which

continuous data is available across the whole set of stocks in this particular universe. The

use of daily closing prices will introduce some spurious effects due to the non-synchronous

closing times of the markets on which these equities trade. In spite of this, however, the

data are deemed suitable for the purposes of illustrating the tools in question and also of

indicating the potential bene¬ts to be gained from intelligent application of these tools.

We consider cointegration as a framework for modelling the inter-relationships between

equities prices, in a manner which can be seen as a sophisticated form of “relative value”

analysis. Depending on the particular task in hand, cointegration techniques can be used

to identify potential hedges for a given equity position and/or to identify potential trades

which might be taken from a statistical arbitrage perspective.

2.1 INTRODUCTION

In this section we describe the econometric concept of “cointegration”, and explain our

motivation for developing trading tools based upon a cointegration perspective.

Cointegration is essentially an econometric tool for identifying situations where stable

relationships exist between a set of time series. In econometrics, cointegration testing is

typically seen as an end in itself, with the objective of testing an economic hypothesis

regarding the presence of an equilibrium relationship between a set of economic variables.

A possible second stage of cointegration modelling is to estimate the dynamics of the

mechanism by which short-term deviations from the equilibrium are corrected, i.e. to

construct an error-correction model (ECM).

The ¬rst aspect of cointegration modelling is interesting from the perspective of “hedg-

ing” assets against each other. The estimated equilibrium relationship will be one in which

the effect of common risk factors is neutralised or at least minimised, allowing low-risk

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m

±

™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5

42 Applied Quantitative Methods for Trading and Investment

combinations of assets to be created. The second aspect is interesting as a potential source

of statistical arbitrage strategies. Deviations from the long-term “fair price” relationship

can be considered as statistical “mispricings” and error-correction models can be used to

capture any predictable component in the tendency of these mispricings to revert towards

the longer term equilibrium.

Whilst the econometric methods used in cointegration modelling form the basis of our

approach, they involve a number of restrictive assumptions which limit the extent to which

they can be applied in practice. From our somewhat contrasting perspective, the use of

tools from cointegration modelling is seen as a “means to an end”, with the “end” being

the creation of successful trading strategies. In this chapter we explore the application of

cointegration-inspired tools to the task of trading and hedging international equities.

For both trading and hedging, the cointegration perspective can be viewed as an exten-