cointegration can be thought of as a principled extension of the relative value strategies,

such as “pairs trading”, which are in common use by market practitioners. In the case

of hedging, the use of a cointegration approach can be viewed as extending factor-model

hedging to include situations where the underlying risk factors are not measurable directly,

but are instead manifested implicitly through their effect on asset prices.

The structure of the rest of the chapter is as follows. In Section 2.2 we provide a

more detailed description of the econometric basis of our approach and illustrate the

way in which cointegration models are constructed and how variance ratio tests can be

used as a means of identifying potentially predictable components in asset price dynam-

ics. In Section 2.3 we explain how cointegration can be used to perform implicit factor

hedging. In Section 2.4 we explain how cointegration can be used to construct sophisti-

cated relative-value models as a potential basis for statistical arbitrage trading strategies.

In Section 2.5 we present a controlled simulation in which we show how cointegration

methods can be used to “reverse engineer” certain aspects of the underlying dynamics

of a set of time series. In Section 2.6 we describe the application of cointegration tech-

niques to a particular set of asset prices, namely the daily closing prices of the 50 equities

which constituted the STOXX 50 index as of 4 July 2002; a detailed description of the

methodology is provided along with a discussion of the accompanying spreadsheet which

contains the analysis itself. Finally, Section 2.7 contains a brief discussion of further

practical issues together with a concluding summary of the chapter.

2.2 TIME SERIES MODELLING AND COINTEGRATION

In this section we review alternative methods for representing and modelling time series.

Whilst often overlooked, the choice of problem representation can play a decisive role in

determining the success or failure of any subsequent modelling or forecasting procedure.

In particular, the representation will determine the extent to which the statistical properties

of the data are stable over time, or “stationary”.

Stable statistical properties are important because most types of model are more suited

to tasks of interpolation (queries within the range of past data) rather than extrapolation

(queries outside the range of known data). Where the statistical properties of a system

are “nonstationary”, i.e. changing over time, future queries may lie in regions outside the

known data range, resulting in a degradation in the performance of any associated model.

The most common solution to the problems posed by nonstationarity is to attempt to

identify a representation of the data which minimises these effects. Figure 2.1 illustrates

Cointegration to Hedge and Trade International Equities 43

Value (yt )

Value (yt )

Time (t ) Time (t )

Value (yt )

Value (yt )

Time (t )

Time (t )

Figure 2.1 Time series with different characteristics, particularly with regard to stationarity: (top

left) stationary time series; (top right) trend-stationary time series; (bottom left) integrated time

series; (bottom right) cointegrated time series

different classes of time series from the viewpoint of the transformations that are required

to achieve stationarity.

A naturally stationary series, such as that shown in the top-left chart, is one which has a

stable range of values over time. Such a series can be directly included in a model, either

as a dependent or independent variable, without creating any undue risk of extrapolation.

The top-right chart shows an example of a “trend-stationary” variable; it is stationary

around a known trend which is a deterministic function of time. A stationary representation

of such a variable can be obtained by “de-trending” the variable relative to the underlying

trend. Some economic time series fall into this category.

Series such as that in the bottom-left chart are known as “difference stationary” because

the period-to-period differences in the series are stationary although the series itself is not.

Turning this around, such series can also be viewed as “integrated series”, which represent

the integration (sum) of a stationary time series. Arti¬cial random-walk series and most

asset prices fall into this category, i.e. prices are nonstationary but price differences,

returns, are stationary.

The two series in the bottom-right chart represent a so-called cointegrated set of vari-

ables. Whilst the individual series are nonstationary we can construct a combined series

(in this case the difference between the two) which is stationary. As we shall demonstrate

below, some sets of asset prices exhibit cointegration to a greater or lesser degree, leading

to interesting and valuable opportunities for both trading and hedging the assets within

the set. Another way of looking at cointegration is that we are “de-trending” the series

against each other, rather than against time.

The class into which a time series or set of time series fall, whether stationary, inte-

grated, or cointegrated, has important implications both for the modelling approach which

should be adopted and the nature of any potentially predictable components that the time

series may contain. Details of a wide range of statistical tests, for identifying both the type

of time series (stationary, nonstationary, cointegrated) and the presence of any potentially

predictable component in the time series dynamics, are provided in Burgess (1999). In

this chapter we will concentrate on two main tests: regression-based tests for the presence

of cointegration, and variance ratio tests for the presence of potential predictability.

44 Applied Quantitative Methods for Trading and Investment

The most popular method of testing for cointegration is that introduced by Granger

(1983) and is based upon the concept of a “cointegrating regression”. In this approach a

particular time series (the “target series”) y0,t is regressed upon the remainder of the set

of time series (the “cointegrating series”) y1,t , . . . , yn,t :

y0,t = ± + β1 y1,t + β2 y2,t + · · · + βn yn,t + dt (2.1)

If the series are cointegrated then statistical tests will indicate that dt is stationary and the

parameter vector ± = (1, ’±, ’β1 , ’β2 , . . . , ’βn ) is referred to as the cointegrating vec-

tor. Two standard tests recommended by Engle and Granger (1987) are the Dickey“Fuller

(DF) and the Cointegrating Regression Durbin“Watson (CRDW). The Dickey“Fuller test

is described later in this chapter, as part of the controlled simulation in Section 2.5. An

extensive review of approaches to constructing and testing for cointegrating relationships

is contained in Burgess (1999).

Variance ratio tests are a powerful way of testing for potential predictability in time

series dynamics. They are derived from a property of unpredictable series where the

variance of the differences in the series grows linearly with the length of the period over

which they are measured. A simple intuition for this property is presented in Figure 2.2.

In the limiting case where all steps are in the same direction the variance of the series will

grow as a function of time squared, at the other extreme of pure reversion the variance of the

series will be independent of time (and close to zero). A random diffusion will be a weighted

combination of both behaviours and will exhibit variance which grows linearly with time.

This effect has been used as the basis of statistical tests for deviations from random-walk

behaviour by a number of authors starting with Lo and MacKinlay (1988) and Cochrane

(1988). The motivation for testing for deviations from random-walk behaviour is that

they suggest the presence of a potentially predictable component in the dynamics of a

time series. The „ -period variance ratio is simply the normalised ratio of the variance of

„ -period differences to the variance of single-period differences:

„ y)2

yt ’

„

(

t

VR(„ ) = (2.2)

2

( yt ’ y)

„

t

Variance

1-period 2-period

Perfect r

Price

(r + r)2 = 4r 2

r2

trend:

r

Time

Perfect

Price (r ’ r)2 = 0

r2

reversion:

’r

r

Time

2r 2

r2

50/50 mix of trend + reversion

Random:

Figure 2.2 The relationship between variance and time for a simple diffusion process

Cointegration to Hedge and Trade International Equities 45

5

Trending

Random walk 4

Variance ratio

Mean reverting

Trending

3

Value

Random walk

2 Mean reverting

1

0

1 2 3 4 5 6 7 8 9 10

Time Period

Figure 2.3 Example time series with different characteristics (left) and their variance ratio

functions (right)

By viewing the variance ratio statistics for different periods collectively, we form the

variance ratio function (VRF) of the time series (Burgess, 1999). A positive gradient to

the VRF indicates positive autocorrelation in the time series dynamics and hence trending

behaviour; conversely a negative gradient to the VRF indicates negative autocorrela-

tion and mean-reverting or cyclical behaviour. Figure 2.3 shows examples of time series

with different characteristics, together with their associated VRFs. Further examples are

contained in Burgess (1999).

For the random walk series, the variance grows linearly with the period „ and hence

the VRF remains close to one. For a trending series the variance grows at a greater than

linear rate and so the VRF rises as the period over which the differences are calculated

increases. Finally, for the mean-reverting series the converse is true: the variance grows

sublinearly and hence the VRF falls below one.

2.3 IMPLICIT HEDGING OF UNKNOWN COMMON

RISK FACTORS

The relevance of cointegration to hedging is based upon the recognition that much of the

“risk” or stochastic component in asset returns is caused by variations in factors which have

a common effect on many assets. This viewpoint forms the basis of traditional asset pricing

models such as the CAPM (Capital Asset Pricing Model) of Sharpe (1964) and the APT

(Arbitrage Pricing Theory) of Ross (1976). Essentially these pricing models take the form:

yi,t = ±i + βi,Mkt Mktt + βi,1 f1,t + · · · + βi,n fn,t + µi,t (2.3)

This general formulation relates changes in asset prices yt to sources of systematic risk

(changes in the market, Mktt , and in other economic “risk factors”, fj,t ) together with

an idiosyncratic asset-speci¬c component µi,t .

The presence of market-wide risk factors creates the possibility of hedging or reducing

risk through the construction of appropriate combinations of assets. Consider a portfolio

consisting of a long (bought) position in an asset y1 and a short (sold) position in an asset

y2 . If the asset price dynamics in each case follow a data-generating process of the form

shown in equation (2.3), then the combined returns y1,t ’ y2,t are given by:

y1,t ’ y2,t = (±1 ’ ±2 )

+ (β1,Mkt ’ β2,Mkt ) Mktt + (β1,1 ’ β2,1 ) f1,t + · · · + (β1,n ’ β2,n ) fn,t

+ (µ1,t ’ µ2,t )

(2.4)

46 Applied Quantitative Methods for Trading and Investment

100 %

90 %

80 %

Asset Y1

70 %

Asset Y2

60 %

Synthetic Y1 ’ Y2

50 %

40 %

30 %

20 %

10 %

0%

basset, Mkt basset, 1 basset, 2 Idiosyncratic 1 Idiosyncratic 2 Total Variance Idiosyncratic

variance from variance

market

factors

Figure 2.4 Attribution of price variance across risk factors: whilst the individual assets Y1 and

Y2 are primarily in¬‚uenced by changes in market-wide risk factors, the price changes of the

“synthetic asset” Y1 ’ Y2 are largely immunised from such effects

If the factor exposures are similar, i.e. β1,j ≈ β2,j , then the proportion of variance which

is caused by market-wide factors will be correspondingly reduced. This effect is illustrated

in Figure 2.4.

A common approach to hedging is to assume that we can explicitly identify at least

reasonable approximations to the underlying risk factors fj,t and factor sensitivities βi,j

and then to create portfolios in which the combined exposure to the different risk factors

lies within a desired tolerance. However, in cases where this may not be the optimal

approach, cointegration provides an alternative method of implicitly hedging the common

underlying sources of risk.

More speci¬cally, given an asset universe UA and a particular “target asset”, T ∈ UA , a

cointegrating regression can be used to create a “synthetic asset” SA(T ) which is a linear

combination of assets which exhibits the maximum possible long-term correlation with the

target asset T . The coef¬cients of the linear combination are estimated by regressing the

historical price of T on the historical prices of a set of “constituent” assets C ‚ UA ’ T :

« 2

T t ’ βi Ci,t

SA(T )t = {βi } = arg min

βi Ci,t (2.5)

s.t.

Ci ∈C Ci ∈C

t=1,...,n

As the aim of the regression is to minimise the squared differences, this is a standard

ordinary least squares (OLS) regression, and the optimal “cointegrating vector” β =

(β1 , . . . , βnc )T of constituent weights can be calculated directly by:

βOLS = (CT C)’1 Ct (2.6)

where C is the nc (= |C|) — n matrix of historical prices of the constituents and t =

(T1 , . . . , Tn )T is the vector of historical prices of the target asset.

The standard properties of the OLS procedure used in regression ensure both that the

synthetic asset will be an unbiased estimator for the target asset, i.e. E[Tt ] = SA(T )t , and

also that the deviation between the two price series will be minimal in a mean-squared-error

Cointegration to Hedge and Trade International Equities 47

sense. The synthetic asset can be considered an optimal statistical hedge for the target

series, given a particular set of constituent assets C.

From an economic perspective the set of constituent assets C act as proxies for the

unobserved common risk factors. In maximising the correlation between the target asset

and the synthetic asset the construction procedure cannot (by de¬nition) account for

the “asset-speci¬c” components of price dynamics, but must instead indirectly optimise

the sensitivities to common sources of economic risk. The synthetic asset represents a

combination which as closely as possible matches the underlying factor exposures of

the target asset without requiring either the risk factors or the exposures to be identi¬ed

explicitly. In Section 2.5, this procedure is illustrated in detail by a controlled experiment

in which the cointegration approach is applied to simulated data with known properties.

2.4 RELATIVE VALUE AND STATISTICAL ARBITRAGE

In the previous section we saw that appropriately constructed combinations of prices can

be largely immunised against market-wide sources of risk. Such combinations of assets

are potentially amenable to statistical arbitrage because they represent opportunities to

exploit predictable components in asset-speci¬c price dynamics in a manner which is

(statistically) independent of changes in the level of the market as a whole, or other market-

wide sources of risk. Furthermore, as the asset-speci¬c component of the dynamics is not

directly observable by market participants it is plausible that regularities in the dynamics

may exist from this perspective which have not yet been “arbitraged away” by market

participants.

To motivate the use of statistical arbitrage strategies, we brie¬‚y relate the opportunities

they offer to those of more traditional “riskless” arbitrage strategies. The basic concept

of riskless arbitrage is that where the future cash-¬‚ows of an asset can be replicated by

a combination of other assets, the price of forming the replicating portfolio should be

approximately the same as the price of the original asset. Thus the no-arbitrage condition

can be represented in a general form as:

|payoff(Xt ’ SA(Xt ))| < Transaction cost (2.7)

where Xt is an arbitrary asset (or combination of assets), SA(Xt ) is a “synthetic asset”

which is constructed to replicate the payoff of Xt and “transaction cost” represents the net

costs involved in constructing (buying) the synthetic asset and selling the “underlying”

Xt (or vice versa). This general relationship forms the basis of the “no-arbitrage” pricing

approach used in the pricing of ¬nancial “derivatives” such as options, forwards and

futures.1 From this perspective, the price difference Xt ’ SA(Xt ) can be thought of as

the mispricing between the two (sets of) assets.

A speci¬c example of riskless arbitrage is index arbitrage in the UK equities market.

Index arbitrage (see for example Hull (1993)) occurs between the equities constituting a

particular market index, and the associated futures contract on the index itself. Typically

the futures contract Ft will be de¬ned so as to pay a value equal to the level of the index

1

See Hull (1993) for a good introduction to derivative securities and no-arbitrage relationships.

48 Applied Quantitative Methods for Trading and Investment

at some future “expiration date” T . Denoting the current (spot) stock prices as Sti , the

no-arbitrage relationship, specialising the general case in equation (2.7), is given by:

wi Sti e(r’qi )(T ’t) < cost

Ft ’ (2.8)

i

where wi is the weight of stock i in determining the market index, r is the risk-free

interest rate, and qi is the dividend rate for stock i. In the context of equation (2.7) the

weighted combination of constituent equities can be considered as the synthetic asset

which replicates the index futures contract.

When the “basis” Ft ’ i wi Sti e(r’qi )(T ’t) exceeds the transaction costs of a particular

trader, the arbitrageur can “lock in” a riskless pro¬t by selling the (overpriced) futures

contract Ft and buying the (underpriced) combination of constituent equities. When the

magnitude of the mispricing between the spot and future grows, there are frequently large

corrections in the basis which are caused by index arbitrage activity, as illustrated in

Figure 2.5 for the UK FTSE 100 index.

Many complex arbitrage relationships exist and “riskless” arbitrage is an important

subject in its own right. However such strategies are inherently self-limiting “ as compe-

tition amongst arbitrageurs grows, the magnitude and duration of mispricings decreases.

Furthermore, in practice, even arbitrage which is technically “riskless” will still involve a

certain level of risk due to uncertain future dividend rates qi , trading risks, and so on. From

this perspective the true attraction of index arbitrage strategies lies less in the theoretical

price relationship than in a favourable property of the mispricing dynamics “ namely a

tendency for the basis risk to “mean revert” or ¬‚uctuate around a stable level.

100 5280

5260

80

Basis (future ’ spot)

5240

60

Index level

5220

40

5200

20

5180

0 5160

’20 5140

10:40:56 AM

11:02:19 AM

11:26:10 AM

11:47:47 AM

12:04:25 PM

12:27:14 PM

12:51:25 PM

1:12:45 PM

1:35:49 PM

1:54:09 PM

2:20:11 PM

2:41:10 PM

2:54:43 PM

3:07:25 PM

3:22:55 PM

3:42:35 PM

3:59:47 PM

Figure 2.5 Illustration of index arbitrage opportunities in the UK equity market; the data con-

sists of 3200 prices for the FTSE 100 index (in bold) and the derivative futures contract expiring

Sept. 98; the lower curve shows the so-called “basis”, the deviation from the theoretical fair price

relationship between the two series; the data sample covers the period from 10.40am to 4pm on

15 September 1998; some of the abrupt price shifts will be due to arbitrage activity

Cointegration to Hedge and Trade International Equities 49

Building upon this insight, the premise of “statistical arbitrage” is that regularities in

combinations of asset prices can be exploited as the basis of pro¬table trading strategies,

irrespective of the presence or absence of a theoretical fair price relationship between the

set of assets involved.

Whilst clearly subject to a higher degree of risk than “true” arbitrage strategies, statisti-

cal arbitrage opportunities offer the hope of being both more persistent and more prevalent

in the markets. More persistent because risk-free arbitrage opportunities are rapidly elim-

inated by market activity. More prevalent because in principle they may occur between

any set of assets rather than solely in cases where a suitable “risk-free” hedging strategy

can be implemented.

A simple form of statistical arbitrage is “pairs trading”, which is in common use by a

number of market participants, such as hedge funds, proprietary trading desks and other

“risk arbitrageurs”. Pairs trading is based on a relative value analysis of two asset prices.

The two assets might be selected either on the basis of intuition, economic fundamentals,

long-term correlations or simply past experience. A promising candidate for a pairs strat-

egy might look like the example in Figure 2.6, between HSBC and Standard Chartered.

The pairs in Figure 2.6 show a clear similarity to the riskless arbitrage opportunities

shown in Figure 2.5. In both cases the two prices “move together” in the long term,

with temporary deviations from the long-term correlation which exhibit a strong mean-

reversion pattern. Note however that in the “statistical arbitrage” case the magnitude of

the deviations is greater (around ±10% as opposed to <0.5%) and so is the time period

over which the price corrections occur (days or weeks as opposed to seconds or minutes).

Opportunities for pairs trading in this simple form, however, are dependent upon the

existence of similar pairs of assets and thus are naturally limited. By constructing synthetic

“pairs” in the form of appropriate combinations of two or more assets, cointegration

techniques provide a sophisticated and powerful method to generalise the relative value

approach and create a wider range of potential trading opportunities. Once a cointegrating

700

200

600

STAN HSBC

150

500

100

Deviation

400

Prices

50

300

0

200

’50 100

’100 0

20/08/98

25/08/98

27/08/98

02/09/98

07/09/98

09/09/98

15/09/98

17/09/98

22/09/98

24/09/98

29/09/98

Figure 2.6 Illustration of potential statistical arbitrage opportunities in the UK equity market;

the chart shows equity prices for Standard Chartered and HSBC, sampled on an hourly basis from

20 August to 30 September 1998. Note the mean-reverting nature of the deviation

50 Applied Quantitative Methods for Trading and Investment

regression has been performed to estimate the “fair price” relationship between a set of

assets, tools such as variance ratio analysis can be used to detect deterministic components

in the mispricing dynamics that could be used as the basis of a “statarb” strategy.

In this and the previous section we have provided a motivation for the use of co-

integration-based techniques for both hedging and trading. In the following section we

supplement this qualitative motivation with some quantitative results obtained from apply-

ing the techniques in a controlled simulation with known time series dynamics.

2.5 ILLUSTRATION OF COINTEGRATION

IN A CONTROLLED SIMULATION

Now that we have described the rationale for applying cointegration-based techniques

in trading, the next sections provide examples of how these techniques can be used in

practice. In Section 2.6 we will explore the application of cointegration techniques to real

asset prices. But before we do that, this section highlights the way in which the techniques

work by means of an arti¬cial example in which the underlying dynamics of the time series

are controlled. Consider the example of a set of three assets, each following a two-factor

version of the data-generating process shown in equation (2.3). In this controlled example

we specify the factor exposures of three assets X, Y and Z as shown in Table 2.1, i.e. price

changes within the set of three assets X, Y and Z are driven by a total of ¬ve factors, two

common risk factors f1 and f2 and three asset-speci¬c components µ1 , µ2 , µ3 . Furthermore

let us specify that f1 and f2 follow random-walk processes whilst the dynamics of the

asset-speci¬c factors contain a mean-reverting component. As discussed in Section 2.4,

these dynamics might also be plausibly the case in reality because predictable effects in

market-wide factors would be easily observed and thus “arbitraged away”, whilst small

predictable components in asset-speci¬c dynamics might be less obvious and hence also

more persistent.

Based on the assumptions described above, let us specify the full dynamics of the

resulting time series by the following equations:

fi,t = ·i,t i = 1, 2 ·i,t ∼ N (0,1)

µj,t = ’0.1µj,t + ej,t j = 1, 2, 3 ej,t ∼ N (0,0.25)

Xt = f1,t + f2,t + µ1,t (2.9)

Yt = f1,t + 0.5 f2,t + µ2,t

Zt = 0.5 f1,t + f2,t + µ3,t

Table 2.1 Price sensitivity of three assets X, Y

and Z to changes in common risk factors f1 and

f2 and asset-speci¬c effects µ1 , µ2 , µ3

f1 f2 µ1 µ2 µ3

Asset

X 1 1 1 0 0

Y 1 0.5 0 1 0

Z 0.5 1 0 0 1

Cointegration to Hedge and Trade International Equities 51

20

15

X

10

Y

Z

5

0

’5

1 101 201 301 401 501

Figure 2.7 Realisation of three simulated asset price series which are driven by two underlying

common factors in addition to asset-speci¬c components

i.e. the unobserved “factor” dynamics of f1 and f2 are driven by the pure noise terms ·i,t ;

the also-unobserved asset-speci¬c dynamics µj,t are a combination of noise terms ej,t with

“error correction” mean-reversion terms with parameter ’0.1; the observed asset dynamics

Xt , Yt and Zt are determined by their different exposures to the ¬ve underlying factors.

The precise “shapes” of the time series will depend on the sampled innovations ·i,t

and ei,t . A particular realisation of the asset prices generated by the system is shown in

Figure 2.7 and this is used as the basis of the analysis below. Note that the common

factor exposures create a broad similarity between the observed price movements of the

three assets.

As described in Section 2.3, we estimate the underlying fair price relationship from the

observed data by performing a cointegrating regression. In this case, we arbitrarily select

X as the “target series” and regress on the other two “cointegrating series” Y and Z. The

resulting relationship estimated by the regression is given by:

Xt = 0.632Yt + 0.703Zt + mt (2.10)

Due to sampling error, the estimated relationship differs slightly from the true underlying

relationship Xt = 2/3Yt + 2/3Zt + m— , which would precisely cancel the factor exposures

t

—

and leave a pure combination (mt ) of the asset-speci¬c terms. However, it is clear that the

cointegrating regression has been able to construct a combination which largely neutralises

the common risk factors, and that it has done this without any explicit knowledge of (or

even estimation of) the factor exposures shown in Table 2.1. It is because they bypass

the need to estimate explicit factor exposures that we refer to cointegration techniques as

performing “implicit” hedging of market-wide risk factors.

In this example, the asset-speci¬c dynamics have been constructed so as to be mean

reverting, so the error term of the regression can be considered as a statistical “mispricing”

which represents the temporary deviation from the estimated “fair price” relationship

between the three assets. Unlike the nonstationary asset prices X, Y and Z, the estimated

mispricing mt , which is illustrated in Figure 2.8, can clearly be seen to be mean reverting.

The mean-reverting nature of the mispricing time series, compared to the close to

random-walk behaviour of the original time series X, Y and Z, is highlighted by the

variance ratio pro¬les shown in Figure 2.9. Whilst the variance ratio for all three original

assets remains close to unity in each case, the variance ratio of the mispricing falls sub-

stantially below one as the period over which the differences are calculated increases. This

indicates that the volatility which is present in the short-term dynamics is not re¬‚ected in

52 Applied Quantitative Methods for Trading and Investment

1.5

1

0.5

0

Mis

’0.5

’1

’1.5

’2

1 101 201 301 401 501

Figure 2.8 The estimated “mispricing” time series, mt = Xt ’ (0.632Yt + 0.703Zt )

1.4

1.2

Variance ratio

1 VR(X)

VR(Y)

0.8

VR(Z)

0.6

VR(Mis)

0.4

0.2

0

1 11 21 31 41 51 61 71 81 91

Period

Figure 2.9 Variance ratio pro¬les for the time series X, Y and Z and mt = Xt ’ (0.632Yt

+ 0.703Zt )

the long-term volatility, thus providing evidence for a substantial mean-reverting compo-

nent in the mispricing dynamics.

Let us now evaluate the effectiveness of the cointegration procedure at “reverse en-

gineering” the underlying factor dynamics. In attempting to replicate the “target” time

series X the cointegrating regression procedure creates the “synthetic asset” 0.632Y +

0.703Z which has similar exposures to the common factors f1 and f2 . Thus in the mis-

pricing time series Xt ’ (0.632Yt + 0.703Zt ) the net exposure to the common factors is

close to zero, allowing the mean-reverting asset-speci¬c effects µ1 , µ2 , µ3 to dominate the

mispricing dynamics. This “statistical hedging” of the common risk factors is quanti¬ed

in Table 2.2, which reports the proportion of the variance of each observed time series

which is associated with each of the underlying factors.

This demonstrates that the use of cointegrating regression can immunise against com-

mon underlying factors which are not observed directly but instead proxied by the

observed asset prices. Whilst the variance of changes in the original time series X, Y

and Z is primarily (70“90%) associated with the common risk factors f1 and f2 , the

effect of these factors on the mispricing m is minimal (0.2%). Conversely, the relative

effect of the asset-speci¬c factors is greatly magni¬ed, growing from 10“30% in the

original time series to 99.8% in the relative mispricing m.

By magnifying the component of the dynamics which is associated with asset-speci¬c

effects, we would expect to magnify the predictable component which (by construction)

is present in the asset-speci¬c effects but not in the common factors. This effect can be

Cointegration to Hedge and Trade International Equities 53

Table 2.2 Sensitivity of price changes of the original

time series X, Y and Z and the “mispricing” time series

Xt ’ (0.632Yt + 0.703Zt ). The table entries show the pro-

portion of the variance of each time series which is asso-

ciated with changes in common risk factors f1 and f2 and

asset-speci¬c effects µ1 , µ2 , µ3

m

X Y Z

f1 46.1% 64.2% 16.8% 0.2%

f2 41.5% 10.1% 67.8% 0.0%

µ1 11.5% 0.2% 0.0% 54.5%

µ2 0.5% 25.1% 0.1% 23.7%

µ3 0.4% 0.4% 15.3% 21.6%

Total 100.0% 100.0% 100.0% 100.0%

quanti¬ed by considering the Dickey“Fuller statistics obtained from simple ECMs of the

time series dynamics:

ˆˆ ˆ

DF(st ) = β/σβ from regression st = ± ’ βst + ·twhere ·t is a noise term

(2.11)

i.e. we regress changes in the time series ( st ) against the level of the series (st ) and test

for a statistically signi¬cant error-correcting coef¬cient β. The details of the estimated

ECMs for our experiment are presented in Table 2.3.

The DF statistic approximately follows a t-distribution so, roughly speaking, DF values

greater than two indicate signi¬cant evidence for a mean-reverting/error-correction effect.

For the underlying (but unobserved) factors, the low DF statistics for f1 and f2 con¬rm

the lack of predictable components in these common factors, whilst the high DF values for

µ1 , µ2 and µ3 (4.908, 5.644 and 4.454 respectively) con¬rm the highly signi¬cant degree

of mean reversion in the asset-speci¬c effects.

In the observed series, X, Y and Z, the mean-reverting effect is “watered down” by

the unpredictable factor effects, with the result that the corresponding DF statistics are

small (actually slightly negative) and present no evidence of a predictable component.

Table 2.3 Details of simple error-correction models estimated to quantify the mean-reverting

component in both the unobserved factors and the observed time series. Values in bold correspond

ˆ

to cases where the estimated mean-reversion coef¬cient β is signi¬cant at the 0.1% level. The

ˆ

rows in the table are: estimated reversion parameter β; standard error of estimate; associated DF

statistic (approximately equivalent to the t-statistic in a standard regression); proportion of variance

explained by model (R 2 )

f1 f2 µ1 µ2 µ3 m

Factor/asset X Y Z

ˆ 0.014 ’0.001 0.079 ’0.001 ’0.000 ’0.001

Estimated β 0.096 0.124 0.077

Std. error σβ 0.008 0.001 0.020 0.022 0.018 0.002 0.003 0.001 0.018

ˆ

1.631 ’0.580 4.454 ’0.452 ’0.004 ’0.591

DF(st ) 4.908 5.644 4.398

R2 0.5% 0% 4.8% 6.2% 4.0% 0% 0% 0% 3.9%

54 Applied Quantitative Methods for Trading and Investment

This picture changes dramatically when we look at the constructed “mispricing” m which

has a high DF statistic of 4.398 “ almost as high as for the true asset-speci¬c effects.

The actual magnitude (as opposed to statistical signi¬cance) of the detected mean-

reversion effect is given by the R 2 values in the table. The results con¬rm that the

predictable component of the dynamics is almost as strongly present in the mispricing

time series as in the underlying, but unobserved, asset-speci¬c dynamics themselves. The

magnitude of the deterministic component in the mispricing is 3.9%, which is comparable

to the 4.8%, 6.2% and 4.0% in the true asset-speci¬c dynamics, and a negligible amount

in the case of the original time series X, Y and Z.

These results from our controlled experiment serve to illustrate the power of the co-

integration approach to remove market-wide risk factors and highlight the asset-speci¬c

components of price dynamics, which in this case were constructed to contain a mean-

reverting effect. However the qualitative reasoning presented in Sections 2.3 and 2.4,

together with quantitative evidence from other sources, suggests that similar results may be

obtained for real asset prices. In the following section we apply essentially the same tech-

niques as those used in this controlled experiment to analyse price relationships between

real assets, namely the equities which constitute the European-wide STOXX 50 index.

2.6 APPLICATION TO INTERNATIONAL EQUITIES

In this section we describe an application of the cointegration tools and techniques

described above to data from those international equities which comprised the STOXX

50 index as of 4 July 2002. We describe this analysis with reference to the accompanying

Excel workbook named “equity coint.xls” on the CD-Rom.

The set of equities which constitute our universe are listed in the ¬rst sheet of the

workbook (named “Constituents”). The full set of equities included in the analysis are

listed in Table 2.4.

The second sheet in the workbook is named “Prices” and contains the raw data for

the analysis. This consists of daily closing prices which have been adjusted to remove

the effects of stock splits, dividends and other corporate actions. The time frame for the

analysis is from 14 September 1998 to 3 July 2002, which is the longest period over

which continuous data is available across the whole set of stocks. This comprises almost

4 years of data, giving 993 daily observations.

Note that this data does not provide a true “snapshot” of the European equity mar-

kets due to the complication that the closing times differ across the different national

exchanges. For a practical trading system this would induce serious distortions to our

models, but for our purposes here the close prices serve adequately to illustrate the use

of the tools we have described above.

The third sheet (“Pairs”) contains a simple relative value analysis of a pair of assets

at a time. The sheet also serves to illustrate the data itself and the use of variance ratio

functions to identify the underlying time series dynamics. A screen shot of this worksheet

is shown in Figure 2.10.

Cells D37 and D38 are used to select two equities whose prices we wish to compare.

The equities are selected by entering numbers from 1 to 50 corresponding to the reference

numbers shown in Table 2.4. The example given shows the case of selecting the British

oil-stock BP (BP.L, number 6 in the set) and the French oil-stock Total-Fina (TOTF.PA,

number 27 in the set). The lower chart plots BP against Total and also shows the synthetic

Cointegration to Hedge and Trade International Equities 55

Table 2.4 The list of companies included in the analysis. The 50 stocks correspond to the con-

stituents of the pan-European STOXX 50 index as of 4 July 2002

Ref. Name Symbol Ref. Name Symbol

1 British Telecom BT.L 26 Zurich Financial ZURZn.VX

2 Glaxo Smithkline GSK.L 27 Total-Fina TOTF.PA

3 Alcatel CGEP.PA 28 Suez LYOE.PA

4 UBS UBSZn.VX 29 Oreal OREP.PA

5 Daimler Chrysler DCXGn.DE 30 Telecom Italia TIT.MI

6 BP BP.L 31 ENI ENI.MI

7 Astro-Zeneca AZN.L 32 Eon EONG.DE

8 Nokia NOK1V.HE 33 Siemens SIEGn.DE

9 Novartis NOVZn.VX 34 Deutsche Bank DBKGn.DE

10 Ericsson ERICb.ST 35 Generali GASI.MI

11 Philips PHG.AS 36 Deutsche Telecom DTEGn.DE

12 ING ING.AS 37 BBVA BBVA.MC

13 ABN Amro AAH.AS 38 Allianz ALVG.DE

14 Aegon AEGN.AS 39 Bayer BAYG.DE

15 Unilever UNc.AS 40 Barclays BARC.L

16 Royal Dutch RD.AS 41 HSBC HSBA.L

17 Swiss Re RUKZn.VX 42 Diageo DGE.L

18 Roche ROCZg.VX 43 Lloyds Bank LLOY.L

19 Vivendi EAUG.PA 44 Prudential PRU.L

20 BSCH SAN.MC 45 Royal Bank of Scotland RBOS.L

21 Nestle NESZn.VX 46 Shell SHEL.L

22 Carrefour CARR.PA 47 Vodafone VOD.L

23 BNP-Paribas BNPP.PA 48 Telefonica TEF.MC

24 Aviva AV.L 49 Munich Re MUVGn.DE

25 AXA AXAF.PA 50 Credit Swiss CSGZn.VX

asset which represents the relative return on the two stocks. All three series have been

normalised to represent log price changes since the beginning of the analysis period. A

close-up of the chart is shown in Figure 2.11.

In this case we see that there appears to be a semi-stable equilibrium which exists

between the two asset prices. For long periods of time the relative price tends to ¬‚uctuate

around an equilibrium or “fair price” level, however signi¬cant shifts in the relationship

also occur, such as the 30% shift in the relative value which occurred during the early part

of 2000. The apparent existence of a relationship between the two price series, together

with the instability in this relationship, serve to respectively illustrate the opportunities

and the risks which arise from a relative value approach to trading.

The top half of the sheet contains a variance ratio analysis of the price dynamics of

the selected equities and the synthetic asset corresponding to their relative prices. The

cells in the range C5:E34 contain array formulae to calculate the n-period variances

for each of the three time series, with n varying between 1 and 30. To the right of

these variances, cells H5:J34 contain the variance ratios, with each n-period variance

normalised by n times the one-period variance. These three functions are plotted in the

chart to the right of the numbers, with the example for BP and Total-Fina shown in

Figure 2.12.

56 Applied Quantitative Methods for Trading and Investment

Figure 2.10 The “Pairs” worksheet containing a pairwise relative value analysis, the selected

stocks are BP (number 6) and Total-Fina (number 27)

0.70

0.60

0.50

0.40

0.30 BP.L

0.20 TOTF.PA

0.10 BP.L/TOTF.PA

0.00

’0.10

’0.20

’0.30

9/14/98 3/14/99 9/14/99 3/14/00 9/14/00 3/14/01 9/14/01 3/14/02

Figure 2.11 Relative prices for BP, Total, and the synthetic asset which is the ratio of the two

In this case we see that the variance ratio functions for the two securities show declining

pro¬les, indicating the presence of reverting components in their time series dynamics. The

mean-reversion tendency is signi¬cantly more prominent in the synthetic asset (BP/Total)

than for either of the individual assets, providing further evidence to support the presence

of a potentially predictable component in the relative price dynamics.

Given such evidence of mean-reverting dynamics we could move on to implement

a statistical arbitrage strategy based on the types of trading rules described by Burgess

Cointegration to Hedge and Trade International Equities 57

1.2

1.0

0.8

Variance ratio

VR(BP.L)

0.6 VR(TOTF.PA)

VR(BP.L/TOTF.PA)

0.4

0.2

0.0

1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

Window length (days)

Figure 2.12 Variance ratio functions for BP, Total, and the synthetic asset which is the ratio of

the two

(1999) and Towers (2000). Note however that in this particular case at least some of

this effect will be due to the non-synchronous sampling of the close price in the French

and UK markets. Because of this non-synchronicity a more sophisticated analysis (and

probably additional data) would be needed to evaluate the true magnitude of the mean-

reverting effect in the relative price of these equities and its viability as the basis for a

pro¬table trading strategy.

Whilst pairs analysis works well for some equities, it is highly sensitive to the properties

of each asset price and works better for some stocks than for others. Essentially it requires

that for a given equity, there is one (and only one) equity which has similar exposures to

each and every underlying factor. For a given equity there may be zero, one or more than

one closely matching pairs and only in the case of a single matching pair is the simple

approach likely to be close to optimal. These complications mean that pairs analysis is

essentially opportunistic in nature rather than representing a general strategy which can

be applied across a broad asset universe.

Cointegration modelling is essentially an extension of pairs analysis which is designed

to overcome these limitations. Rather than requiring the existence of a single perfect

match we instead create an optimally matching “synthetic asset” in the form of a weighted

combination of one or more assets. The remaining sheets in the workbook demonstrate

the workings and results of this more sophisticated form of relative value modelling.

Firstly, the sheet “CointAnalysis” illustrates the construction of a synthetic asset to

match a chosen “target” asset. A screen shot of this worksheet is shown in Figure 2.13. The

top-left of the worksheet contains various control parameters and diagnostic information.

The chart in the top-centre of the worksheet presents a variance ratio analysis of the

statistical mispricing. The bottom-left chart is a visualisation of the synthetic asset and

the chart in the bottom-right shows the evolution of the various price series over time.

58 Applied Quantitative Methods for Trading and Investment

Figure 2.13 The “CointAnalysis” worksheet showing the construction of a synthetic asset to

match asset number 6: British Petroleum (BP.L)

CONTROLS Manual Select X 6 using: 6

RidgeFac 0.01

total 993

Insample 700

outsample 293

Figure 2.14 The controls for the cointegration analysis

The controls for the cointegration analysis are contained in the top-left of the worksheet.

As elsewhere in the workbook, the convention is that user-speci¬ed controls are contained

in cells with a black border and yellow background. In this case there are four such cells,

as shown in Figure 2.14.

Firstly, the target series is speci¬ed in cell F2, using the reference numbers listed in

Table 2.4. In this case we remain with the same example as before: asset number 6, British

Petroleum, or BP.L for short. Note that in order to allow the generation of automatic tables,

the actual control cell is H2, and cell F2 acts as a kind of manual override.

Cointegration to Hedge and Trade International Equities 59

The second control cell is F4, labelled “RidgeFac”. This represents an important

modi¬cation of the basic methodology, which is necessary to avoid the problems caused

by regressing on large numbers of variables. Rather than using a standard regression,

this more practical methodology uses a “ridge regression” in which the resulting param-

eters are in some sense “smoothed” or “regularised” and this cell controls the amount of

smoothing (Hoerl and Kennard, 1970a,b).

The ¬nal control parameters consist of the number of observations which should be used

to construct the model (the “in-sample” set) and the subsequent number of observations

which should be used to evaluate the model performance (the “out-of-sample” set). Cell

F6 indicates the number of observations available in total, which for this analysis is 993.

Cell F8 is used to specify the number of “in-sample” observations. In this case we use

700 observations, representing approximately two-thirds of the available data. By default,

all of the remaining observations are used to perform the “out-sample” evaluation. This

number can be overridden using cell G9, but in this case is left as the default, giving 293

observations for the out-of-sample results analysis.

The data for the regression is collected on the “CointModel” worksheet. The target

asset is stored in column H of this worksheet; a constant column of ones is placed in

column J; and the 49 cointegrating assets are remapped to the adjacent columns K through

to BG. It is useful to have the 50 independent variables in contiguous columns in order to

simplify the matrix algebra used to compute the solution to the cointegrating regression.

The calculations for the cointegrating regression are performed on the “Workings” sheet.

The worksheet performs a “ridge regression” (Hoerl and Kennard, 1970a,b) in which the

solution is given by β = (CT C + »σ I)’1 Ct. The target vector t and data matrix C (the

49 other asset price series supplemented by a column of ones) are referenced from the

“CointModel” worksheet. The regularisation parameter lambda (») is referenced from cell

F4 of the “CointAnalysis” worksheet. The covariance matrix CT C is calculated in cells

G4:BD53. The vector Ct is calculated in BI4:BI53. The enhanced covariance matrix,

CT C + »σ I, is constructed in cells BM4:DJ53 by re-scaling the diagonal elements of

CT C. The inverse of this enhanced matrix is calculated in cells G56:BD105. Finally the

beta parameters are calculated in cells BI56:BI105 by multiplying this inverse by the

vector Ct.

With the regularisation parameter set to » = 0, the solution reduces to the standard

OLS regression: β = (CT C)’1 Ct. Lambda acts as a scaling coef¬cient for the diagonal

component of the covariance matrix CT C, proportionally downweighting the off-diagonal

covariance terms and reducing the apparent correlation between the different series. As

we will see below this has an important effect in stabilising the regression and enabling

us to use 50 regressor variables, more than would normally be practically feasible.

The resulting beta vector is copied across to cells J25:BG25 of the “CointModel” work-

sheet and used to construct the synthetic asset. This is calculated as the beta-weighted

average of the 50 constituent assets (including constant term) and is stored in cells

G40:G1032. Note that once the betas have been estimated from the ¬rst 700 observa-

tions (in this case), the same weights can be applied to subsequent data to calculate the

values of the synthetic asset during the out-of-sample period. For purposes of visualising

the composition of the synthetic asset, we take the beta vector and multiply through by

the scale of the individual time series. The resulting “effective weights” are illustrated in

the lower left-hand chart on the “CointAnalysis” worksheet which is also reproduced in

Figure 2.15.

60 Applied Quantitative Methods for Trading and Investment

20%

SHEL.L

15% RD.AS

TOTF.PA

CARR.PA

10%

BT.L

ENI.MI

BNPP.PA CSGZn.VX

constant

LLOY.L

AAH.AS AV.L

DCXGn.DE

5% BARC.L

NESZn.VX EONG.DE

AZN.L PHG.AS HSBA.L PRU.L VOD.L

UNc.AS SAN.MC

GSK.L NOK1V.HE OREP.PA DBKGn.DE DGE.L

UBSZn.VX AXAF.PA BAYG.DE TEF.MC

NOVZn.VX

0%

RUKZn.VX

SIEGn.DE BBVA.MC RBOS.L

CGEP.PA ERICb.ST ROCZg.VX LYOE.PA DTEGn.DE

ING.AS AEGN.AS EAUG.PA ZURZn.VX TIT.MI

GASI.MI

’5% ALVG.DE

MUVGn.DE

’10%

BP.L

Figure 2.15 Effective weights for the synthetic asset for British Petroleum (BP.L) » = 0.01

In the case of BP, the synthetic asset weights are dominated by other oil stocks, par-

ticularly Royal Dutch/Shell (RD.AS and SHEL.L) and Total-Fina (TOTF.PA), however

most of the other stocks also have non-zero, though small, weightings indicating that the

best historical ¬t to BP price movements is obtained by taking into account a wide range

of other stocks.

Given the target asset and the constructed synthetic asset we can calculate the difference

in price which is equivalent to the residual of the regression. The evolution of this time

series represents the performance of a hedged portfolio with a long position in the target

asset and an offsetting short position in the synthetic asset. If the synthetic asset is a good

hedge for the target, this residual price should have low volatility and remain close to

zero. In order to evaluate the effectiveness of the cointegration procedure we compare this

price residual to that obtained by a simpler procedure, namely hedging with an equally

weighted “market” portfolio. These time series are visualised in the bottom right-hand

chart of the “CointAnalysis” worksheet, which is reproduced in Figure 2.16.

The vertical line divides the time axis into the in-sample and out-of-sample periods.

During the in-sample period we expect the synthetic asset to closely match the target

(BP.L) simply by construction; similarly the corresponding “residual” is stable around

the zero level. Note that the synthetic asset is an average across a number of stocks and

in this case, as would be typical, has a smoother price trajectory than the target asset

itself but on the whole does tend to track the longer term price movements observed in

the target series. The synthetic market price obtained as an unweighted average across

the set of stocks appears to be less successful in following the price of the target asset

and this is also observed in the higher volatility of the corresponding (market) residual.

During the out-of-sample period, the synthetic asset price will only track the target asset

to the extent to which it has a similar exposure to the underlying risk factors which drive

Cointegration to Hedge and Trade International Equities 61

800

700

600

500

BP.L

400 synthetic

resid

300

s-mkt

m-resid

200

100

0

’100

’200

’300

9/14/1998

11/14/1998

1/14/1999

3/14/1999

5/14/1999

7/14/1999

9/14/1999

11/14/1999

1/14/2000

3/14/2000

5/14/2000

7/14/2000

9/14/2000

11/14/2000

1/14/2001

3/14/2001

5/14/2001

7/14/2001

9/14/2001

11/14/2001

1/14/2002

3/14/2002

5/14/2002

Figure 2.16 Hedged and unhedged time series for the cointegration model for BP

asset prices. In this case the model for BP appears quite successful, the residual remains

in a similar price range as during the in-sample period and seems also to be relatively

stable around the zero level.

The “CointAnalysis” worksheet also displays some basic measures which quantify

some properties of the synthetic asset and the out-of-sample performance. The values

corresponding to this example are shown in Figure 2.17.

The ¬rst two values characterise the makeup of the synthetic asset. The “sum” ¬gure

corresponds to the normalised sum of the asset weights, typically we would expect this

to be close to 100%. The “sumabs” ¬gure corresponds to the normalised sum of the

absolute asset weights; if there are some negative weights, these will typically be offset

RESULTS sum 100%

sumabs 166%

RawVar 1294.93

ResVar 423.15

ResMkt 1262.60

Reduction 67%

MktRed 2%

Improve 65%

Figure 2.17 Characteristics of the cointegration model for BP.L

62 Applied Quantitative Methods for Trading and Investment

by positive weights over and above 100% and the sum of the absolute weights will

re¬‚ect this. The ¬gure indicates that the sum of the absolute weights is 166% in this

case, re¬‚ecting negative weights totalling about 33% and offset by approximately 133%

of positive weights, to give a total of 166%. Cross-checking against the visualisation of

the weights in Figure 2.15, these numbers seem to be reasonable.

The sum of the absolute weights can be quite an important issue, as it provides an

estimate of the quantity of assets we need to buy and sell in order to use the synthetic

asset as a hedge. This measure also highlights the importance of using regularisation. For

instance, with regularisation set to zero, the sum of the absolute weights for the BP.L

synthetic asset becomes 404%, indicating that each unit of BP needs to be hedged against

a long“short combination of equities totalling four times the value invested in BP!

The remaining ¬gures serve to quantify the effectiveness of the synthetic asset at hedg-

ing the volatility in the target asset. These measures are calculated during the out-of-sample

period in order to produce unbiased results. The “RawVar” ¬gure corresponds to the

volatility of the asset, measured in terms of price variance, the “ResVar” is the residual

variance when hedged by the synthetic asset, and “ResMkt” is the residual variance when

hedged against an equally-weighted “market” portfolio. The ¬nal three ¬gures represent

the proportional effectiveness of the hedging procedure. Thus in this case, the “Reduc-

tion” of 67% indicates that the synthetic asset hedge removes 67% of the out-of-sample

volatility. The “market” portfolio is not a good hedge in this case, only removing 2% of

the volatility in BP, and thus the cointegration approach improves on the market hedge by

65% of the original volatility. This particular example serves to highlight the potentially

large improvement which can be obtained by replacing market-based hedging with the

cointegration approach, but it is only fair to note that in other cases the market hedge

performs equally well or even better than the cointegration approach. A fairer comparison,

across the whole set of 50 stocks, will be presented towards the end of this section.

The ¬nal part of the “CointAnalysis” worksheet presents a variance ratio analysis of

the time series dynamics of the hedged portfolio, with the result being shown in the chart

at the top of the worksheet. The calculations underlying this chart are contained in the

top-left corner of the “CointModel” worksheet. The results for the BP.L model are shown

in Figure 2.18.

1.4 900

1.3 800

Variance ratio (LH axis)

1.2 Variance (RH axis) 700

1.1 600

1

500

0.9

400

0.8

300

0.7

200

0.6

100

0.5

0

0.4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Length of time period

Figure 2.18 Variance ratio analysis of the cointegration model for BP.L

Cointegration to Hedge and Trade International Equities 63

The variance ratio chart clearly indicates the decaying pattern which corresponds to

mean-reverting dynamics. The 30-period variance ratio is just below 0.5, indicating that

the variance computed over 30-day intervals is less than half as high as would be expected,

given the observed 1-day variance. This suggests that, from a relative value perspective,

over 50% of the short-term volatility in BP is essentially spurious price ¬‚uctuation which

has a strong tendency to cancel itself out over a longer time-scale. This pattern can also

be observed, though less clearly, from the concave shape of the variance curve itself.

As these ¬gures are out-of-sample results they would suggest the possibility of ¬nding

a suitable statistical arbitrage strategy to exploit this mean-reverting component in the

relative price dynamics of BP against the synthetic asset portfolio. In this case, however,

the same caveat as before applies in that the non-synchronous nature of our close-price

data may overstate the size of the true reversion effect.

Before moving on to consider the performance across our broader universe, let us ¬rst

consider the importance of the regularisation parameter. Remember that the results we

have been describing above correspond to a model constructed with » = 0.01. Let us now

compare this model to the model we obtain by leaving all other parameters the same but

replacing the “RidgeFac” value in cell F4 with 0.1. The composition of the new synthetic

asset is shown in Figure 2.19.

With this higher degree of regularisation, the weights become more uniform. Very few

are now negative and the highest weight (for Shell, SHEL.L) is reduced from approx-

imately 18% to only 6%. The “sum” of the weights falls to 98% and the “sumabs” to

103%. In this case, however, the new synthetic asset is a less effective hedge for price

movements in BP. The residual variance of the hedged portfolio rises to 702.56 (from

the 423 shown in Figure 2.17), and the reduction in variance due to the hedge is now

only 46% (from 67% previously). Thus, in this particular case, increasing the degree of

7%

SHEL.L

RD.AS

6%

CARR.PA

5%

TOTF.PA

BARC.L

BT.L

4%

CSGZn.VX

LLOY.L

ENI.MI

AAH.AS RBOS.L

BNPP.PA HSBA.L

EONG.DE

3% constant DCXGn.DE NESZn.VX

AV.L

SAN.MC

VOD.L

DGE.L

UNc.AS

DBKGn.DE BAYG.DE

PRU.L

UBSZn.VX

2% OREP.PA

AZN.L ROCZg.VX

PHG.AS AXAF.PA

GSK.L BBVA.MC TEF.MC

RUKZn.VX

NOVZn.VX

1% DTEGn.DE

INGAS

. LYOE.PA

SIEGn.DE

AEGN.AS ZURZn.VX

EAUG.PA

CGEP.PA GASI.MI

TIT.MI

NOK1V.HE

0%

ERICb.ST

’1% ALVG.DE

MUVGn.DE

’2%

BP.L

Effective weights for the synthetic asset for British Petroleum (BP.L) » = 0.1

Figure 2.19

64 Applied Quantitative Methods for Trading and Investment

regularisation has decreased the effectiveness of the synthetic asset as a hedging portfolio.

This should not be surprising, as in the limit we would expect a heavily regularised syn-

thetic asset to closely match the equally-weighted portfolio, which we know is not a very

good hedge in this case.

It is easy to con¬rm that moving to the opposite extreme also leads to a performance

degradation: with the ridge factor set to zero, not only does the sum of absolute weights

rise to the unattractive 404% mentioned above, but the residual variance of 611.01 (53%

reduction) is also worse than the 423 (67%) for the intermediate case of » = 0.01. These

results indicate a pattern which is typical of much statistical modelling: a certain degree

of regularisation tends to be bene¬cial, but beyond a certain point the smoothing becomes

excessive and begins to degrade the model performance.

Whilst the case of this one model, for BP, is both interesting and illustrative, it is

important to know whether these results are merely a lucky “one off ” or whether they

represent an approach which can be applied more generally. For this reason the ¬nal

worksheet in the analysis, called “Results Summary”, contains a table of results generated

by taking each of the 50 assets in turn as the “target” asset. A particular sample of these

results is shown in Table 2.5.

Table 2.5 Performance of cointegration model across the universe of equities

Stock VarRed MktRed RelImp HdgeFac In-sample Out-sample SumWts AbsSumWts

BP.L 67% 2% 65% 0.01 700 293 100% 166%

BT.L 47% 31% 16% 0.01 700 293 103% 373%

GSK.L 56% 38% 18% 0.01 700 293 100% 188%

CGEP.PA 82% 58% 24% 0.01 700 293 114% 506%

’3%

UBSZn.VX 46% 49% 0.01 700 293 100% 143%

’1% 46% ’47%

DCXGn.DE 0.01 700 293 101% 289%

BP.L 67% 2% 65% 0.01 700 293 100% 166%

AZN.L 54% 37% 17% 0.01 700 293 98% 172%

NOK1V.HE 62% 36% 27% 0.01 700 293 101% 377%

’37% ’50%

NOVZn.VX 13% 0.01 700 293 101% 170%

ERICb.ST 67% 42% 25% 0.01 700 293 93% 396%

PHG.AS 31% 31% 0% 0.01 700 293 96% 193%

60% ’21%

ING.AS 39% 0.01 700 293 104% 165%

’1%

AAH.AS 39% 40% 0.01 700 293 100% 141%

AEGN.AS 86% 64% 22% 0.01 700 293 101% 230%

’68%

UNc.AS 21% 88% 0.01 700 293 98% 179%

RD.AS 68% 67% 1% 0.01 700 293 102% 160%

RUKZn.VX 69% 62% 8% 0.01 700 293 98% 154%

’60% 46% ’105%

ROCZg.VX 0.01 700 293 99% 158%

EAUG.PA 49% 41% 8% 0.01 700 293 105% 231%

SAN.MC 86% 79% 7% 0.01 700 293 101% 117%

’43%

NESZn.VX 32% 75% 0.01 700 293 98% 148%

50% ’11%

CARR.PA 39% 0.01 700 293 100% 277%

’42%

BNPP.PA 3% 45% 0.01 700 293 93% 151%

40% ’15%

AV.L 26% 0.01 700 293 103% 163%

AXAF.PA 60% 56% 4% 0.01 700 293 103% 154%

ZURZn.VX 75% 64% 11% 0.01 700 293 107% 289%

Cointegration to Hedge and Trade International Equities 65

Table 2.5 (continued )

Stock VarRed MktRed RelImp HdgeFac In-sample Out-sample SumWts AbsSumWts

’6%

TOTF.PA 57% 64% 0.01 700 293 98% 161%

LYOE.PA 53% 29% 24% 0.01 700 293 98% 152%

’140% ’161%

OREP.PA 21% 0.01 700 293 97% 130%

’6%

TIT.MI 76% 83% 0.01 700 293 100% 224%

44% ’48%

ENI.MI 91% 0.01 700 293 99% 152%

’28% ’104%

EONG.DE 75% 0.01 700 293 98% 157%

SIEGn.DE 63% 37% 26% 0.01 700 293 99% 211%

DBKGn.DE 70% 64% 6% 0.01 700 293 101% 196%

’19%

GASI.MI 37% 56% 0.01 700 293 103% 169%

’13%

DTEGn.DE 46% 60% 0.01 700 293 98% 379%

’5%

BBVA.MC 81% 86% 0.01 700 293 102% 154%

ALVG.DE 88% 73% 15% 0.01 700 293 100% 174%

’6%

BAYG.DE 63% 69% 0.01 700 293 106% 185%

71% ’26%

BARC.L 97% 0.01 700 293 100% 190%

HSBA.L 71% 44% 27% 0.01 700 293 99% 211%

23% ’48%

DGE.L 71% 0.01 700 293 97% 155%

’61% ’97%

LLOY.L 36% 0.01 700 293 99% 196%

PRU.L 67% 62% 5% 0.01 700 293 98% 202%

66% ’48%

RBOS.L 115% 0.01 700 293 96% 219%

’20%

SHEL.L 33% 53% 0.01 700 293 102% 157%

VOD.L 33% 23% 10% 0.01 700 293 104% 207%

TEF.MC 77% 57% 20% 0.01 700 293 101% 237%

MUVGn.DE 70% 61% 9% 0.01 700 293 99% 228%

’22%

CSGZn.VX 47% 69% 0.01 700 293 102% 147%

Mean 42% 24% 18% 0.01 700 293 100% 206%

Median 53% 43% 12% 0.01 700 293 100% 176%

The metrics in the table are precisely the same as those which are presented for indi-

vidual models on the “CointAnalysis” worksheet (and in fact are directly derived from

those values). Whilst the performance varies substantially from one equity to another,

the ¬gures for both mean and median performance con¬rm the general applicability of

the approach. During what has been a very turbulent time for the equity markets, the

synthetic hedge portfolios manage to reduce the out-of-sample volatility by a factor of

42% (mean) or 53% (median) “ note that the mean performance is more heavily affected