The volatility threshold is input in cell C4. Columns G to N contain the trading simulation

formulas; in particular, columns I, J, K and L contain the pro¬t and loss calculations, while

columns M and N compute the drawdowns. The pro¬t and loss ¬gures are computed

“open” in column I (i.e. how much pro¬t/loss is potentially realised by the strategy

each day) and “closed” in column J (i.e. how much pro¬t/loss is effectively realised

at the close of each trade). Columns K and L are merely cumulative computations of

column J and I and give the cumulative pro¬t and loss ¬gures effectively and potentially

realised respectively. The computation of the drawdowns (i.e. the differences between the

maximum pro¬t potentially realised to date and the pro¬t or loss realised to date) are

computed in columns M and N. Lastly, the trading summary statistics are computed in

cells I2 to I6.

The second worksheet (“Graph”) contains the graph showing the open pro¬t and loss.

The third worksheet (“Backup”) contains the raw data, i.e. columns A to D contain the

forex data from Reuters (at 22:00 GMT) with the bid and ask quotes.

Cells C3 and C4 allow the user to simulate the performance of different scenarios

depending on the EMA operator and the volatility threshold value respectively selected.

The summary results of the trading strategy are displayed in cells I2:I6.

Figure 6.4 displays the daily pro¬t and loss trajectories for the two hedging strat-

egies analysed in this chapter. During the ¬rst period (until mid-1997), the EMA without

volatility ¬lter dominates the other strategy. As of mid-1997, the volatility ¬lter strategy

clearly outperforms in terms of pro¬t and risk. The chart also shows that the equity

curve of the volatility ¬lter strategy is less volatile, thus leading to lower risk. The ¬gure

also suggests a somewhat poor performance for the EMA strategy without volatility

¬lter.

206 Applied Quantitative Methods for Trading and Investment

20 %

With volatility filter

Without volatility filter

15 %

10 %

5%

0%

’5 %

’10 %

’15 %

5

96

6

6

6

97

7

7

7

98

8

8

-9

r-9

l-9

-9

r-9

l-9

-9

r-9

l-9

n-

n-

n-

ct

ct

ct

Ju

Ju

Ju

Ap

Ap

Ap

Ja

Ja

Ja

O

O

O

Figure 6.4 Daily cumulative pro¬t and loss curves

6.5 CONCLUSION

In this chapter we have studied the behaviour of a Markovian switching volatility model

for the USD/DEM daily foreign exchange series.

The analysis of the residuals shows that the heteroskedasticity of the initial dataset has

been removed although the residuals series still presents high excessive kurtosis and fails

to pass two standard normality tests.

When applied within a hedging framework, the ¬ltered probabilities computed with

the Markovian switching model enable us to improve the performance of trend-following

systems both in terms of risk and absolute pro¬ts. This empirical result denotes that

foreign exchange markets do not follow trends in a systematic way and that volatility

modelling could be one interesting approach to classify market trend dynamics.

As a further extension, an economic interpretation of these results should be considered.

The existence of long-lasting trending and jagged periods in very liquid markets are issues

that deserve more attention.

REFERENCES

Baillie, R. and T. Bollerslev (1989), “Intra-Day and Inter-Market Volatility in Exchange Rates”,

Review of Economic Studies, 58, 565“85.

Bera, A. K. and M. L. Higgins (1993), “ARCH Models: Properties, Estimation and Testing”, Jour-

nal of Economic Surveys, 7, 305“66.

Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroskedasticity”, Journal of

Econometrics, 31, 307“27.

Bollerslev, T., R. Chou and K. Kroner (1992), “ARCH Modeling in Finance: A Review of the

Theory and Empirical Evidence”, Journal of Econometrics, 52, 5“59.

Switching Regime Volatility 207

Bollerslev, T., R. Engle and D. Nelson (1993), “ARCH Models”, in R. F. Engle and D. McFadden

(eds), Handbook of Econometrics, Vol. 4, North-Holland, Amsterdam.

Campbell, J. Y., A. W. Lo and A. C. MacKinlay (1997), The Econometrics of Financial Markets,

Princeton University Press, Princeton, NJ.

Chesnay, F. and E. Jondeau (2001), “Does Correlation Between Stock Returns Really Increase

During Turbulent Periods”, Economic Notes, 30 (1), 53“80.

Dueker, M. J. (1997), “Markov Switching in GARCH Processes and Mean-Reverting Stock Market

Volatility”, Journal of Business & Economic Statistics, 15, 26“34.

Engel, C. (1994), “Can the Markov Switching Model Forecast Exchange Rates?”, Journal of Inter-

national Economics, 36, 151“65.

Engel, C. and J. D. Hamilton (1990), “Long Swings in the Dollar: Are They in the Data and Do

Markets Know It?”, American Economic Review, 80, 689“713.

Engle, R. (1982), “Autoregressive Conditional Heteroskedasticity with Estimates of the Variances

of U.K. In¬‚ation”, Econometrica, 50, 987“1008.

Engle, R. (1995), ARCH Selected Readings, Oxford University Press, Oxford.

Ghysels, E., C. Gouri´ roux and J. Jasiak (1998), “High Frequency Financial Time Series Data:

e

Some Stylized Facts and Models of Stochastic Volatility”, Chapter III.7 in C. Dunis and B. Zhou

(eds), Nonlinear Modelling of High Frequency Financial Time Series, John Wiley, Chichester.

Goldfeld, S. M. and E. Quandt (1973), “A Markov Model for Switching Regressions”, Journal of

Econometrics, 1, 3“16.

Goldfeld, S. M. and E. Quandt (1975), “Estimation in a Disequilibrium Model and the Value of

Information”, Journal of Econometrics, 3, 325“48.

Goodhart, C. and L. Figliuoli (1991), “Every Minute Counts in Financial Markets”, Journal of

International Money and Finance, 10, 23“52.

Goodwin, T. H. (1993), “Business-Cycle Analysis with a Markov Switching Model”, Journal of

Business & Economic Statistics, 11, 331“39.

Gouri´ roux, C. (1997), ARCH Models and Financial Applications, Springer-Verlag, New York.

e

Gray, S. F. (1996), “Modeling the Conditional Distribution of Interest Rates as a Regime-Switching

Process”, Journal of Financial Economics, 42, 27“62.

Guillaume, D., M. Dacorogna, R. Dav´ , U. M¨ ller, R. Olsen and O. Pictet (1997), “From the Bird™s

e u

Eye to the Microscope: A Survey of New Stylized Facts of the Intra-Daily Foreign Exchange

Markets”, Finance and Stochastics, 1, 95“129.

Hamilton, J. D. (1988), “Rational-Expectations Econometric Analysis of Changes in Regimes: An

Investigation of the Term Structure of Interest Rates”, Journal of Economic Dynamics and Con-

trol, 12, 385“423.

Hamilton, J. D. (1989), “A New Approach to the Economic Analysis of Nonstationary Timeseries

and the Business Cycle”, Econometrica, 57, 357“84.

Hamilton, J. D. (1994), Time Series Analysis, Princeton University Press, Princeton, NJ.

Hamilton, J. D. and G. Lin (1996), “Stock Market Volatility and the Business Cycle”, Journal of

Applied Econometrics, 11, 573“93.

Kim, C. J. and C. R. Nelson (1998), “Business Cycle Turning Points, a New Coincident Index,

and Tests of Duration Dependence Based on a Dynamic Factor Model with Regime Switching”,

Review of Economics and Statistics, 80, 188“201.

Lam, P. S. (1990), “The Hamilton Model with a General Autoregressive Component”, Journal of

Monetary Economics, 26, 409“32.

M¨ ller, U. (1995), “Specially Weighted Moving Averages with Repeated Application of the EMA

u

Operator”, Internal Paper, O&A Research Group.

M¨ ller, U., M. Dacorogna, D. Dav´ , R. Olsen, O. Pictet and J. Von Weizs¨ cker (1997), “Volatilities

u e a

of Different Time Resolutions “ Analyzing the Dynamics of Market Components”, Journal of

Empirical Finance, 4 (2 & 3), 213“40.

Quandt, R. E. (1958), “The Estimation of Parameters of a Linear Regression System Obeying Two

Separate Regimes”, Journal of the American Statistical Association, 53, 873“80.

Ramchand, L. and R. Susmel (1998), “Volatility and Cross Correlation Across Major Stock Mar-

kets”, Journal of Empirical Finance, 5, 397“416.

208 Applied Quantitative Methods for Trading and Investment

Schnidrig, R. and D. W¨ rtz (1995), “Investigation of the Volatility and Autocorrelation Function of

u

the USD/DEM Exchange Rate on Operational Time Scales™, Proceedings of the High Frequency

Data in Finance Conference (HFDF-I), Z¨ rich.

u

APPENDIX A: GAUSS CODE FOR MAXIMUM LIKELIHOOD

FOR VARIANCE SWITCHING MODELS

The code is made up of a core program (MLS.PRG) that contains sub-procedures (i) proc1

that computes the maximum likelihood theta (θ ) and (ii) proc2 that computes the ¬ltered

and smoothed probabilities.

The comments are framed with the following signs: /* comment */.

MLS.PRG

/* MLS.PRG Maximum likelihood code for model with changing variances */

/* Part I : load the libraries and source files */

/* ------------------------------------------------------------------- */

library maxlik,pgraph;

#include maxlik.ext;

maxset;

graphset;

#include maxprtm.src;

/* external source file see details here below */

#include smooth.src;

/* Part II : load the log returns data vector rt */

/* ------------------------------------------------------------------- */

open fin=filename for read;

x=readr(fin, nb of obs in the rt vector);

T=rows(x);

dat=x[1:T-1,1];

rt=100*x[1:T-1,2];

/* Part III : initialisation of parameters */

/* ------------------------------------------------------------------- */

A=0.0;

S1=0.6;

S2=0.4;

p=0.8; p=-ln((1-p)/p);

q=0.2; q=-ln((1-q)/q);

b0=A|S1|S2|p|q;

let b0={

0.001

1.36

0.60

2.00

3.00

}; b0=b0™;

x=rt;

T=rows(x);

output file=out.out reset;

output file=out.out off;

Switching Regime Volatility 209

/* Part IV : this procedure computes the vector of likelihood evaluated theta (θ ) */

/* ----------------------------------------------------------------------------- */

proc (1)=switch(theta,Rt);

local A,S1,S2,p,q,PrSt,St,j,ft,aux,p0,x,BP,mu,sig,K,auxo,rho;

local l,maxco,auxn, fRtSt, PrStRtM, PrStRt, fRt, t, BigT, PStat,const;

A=theta[1];

S1=ABS(theta[2]);

S2=ABS(theta[3]);

x=theta[4]; p=exp(x)/(1+exp(x));

x=theta[5]; q=exp(x)/(1+exp(x));

BP=(p∼(1-p))|((1-q)∼q);

mu=A;

sig=S1|S2;

K=2;

BigT=rows(rt);

rho=(1-q)/(2-p-q);

Pstat=rho∼(1-rho);

const=1/sqrt(2*Pi);

fRtSt=const*(1./sig™).*exp(-0.5*( ((Rt-mu)./sig™)ˆ2 ));

PrStRtM=Pstat|zeros(BigT-1,K);

PrStRt=zeros(BigT,K);

fRt=zeros(BigT,1);

t=2;

do until t>=BigT+1;

aux=fRtSt[t-1,.].*PrStRtM[t-1,.];

PrStRt[t-1,.]=aux/(sumc(aux™)™);

PrStRtM[t,.]=PrStRt[t-1,.]*BP;

t=t+1;

endo;

fRt[1,.]=fRtSt[1,.]*Pstat™;

fRt[2:BigT]=sumc( (fRtSt[2:BigT,.].*PrStRtM[2:BigT,.])™ );

retp(ln(fRt[1:BigT]));

endp;

/* Part V : maximum likelihood Evaluation(BHHH optimisation algorithm) */

/* ------------------------------------------------------------------- */

max Algorithm=5; /* 5= BHHH Algorithm */

max CovPar=2; /* Heteroskedastic-consistent */

max GradMethod=0;

max LineSearch=5; /* 5= BHHH Algorithm */

{th,f,g,h,retcode}=maxlik(x,0,&switch,b0);

output file=out.out reset;

call maxprtm(th,f,g,h,retcode,2);

/* Part VI : call the routine smooth.src that computes the filtered probabilities

(fpe) and the smooth probabilities (spe) */

/* -------------------------------------------------------------------------- */

{fpe,spe}=smooth(th[1]|th,x);

output file=out.out off;

210 Applied Quantitative Methods for Trading and Investment

/* Part VII : this procedure computes filtered & smoothed probability estimates */

/* --------------------------------------------------------------------- */

proc (2)=smooth(th,y);

local mu1,mu2,p,q,s1,s2,rho,pa,pfx,its,p1,ind,thn,yxx,fk,t,spe,pax,qax,n;

/* Data initialisation */

/* ------------------------------------------------------------------- */

mu1 = th[1];

mu2 = th[2];

s1 = abs(th[3]);

s2 = abs(th[4]);

p = th[5]; p=exp(p)/(1+exp(p));

q = th[6]; q=exp(q)/(1+exp(q));

n = rows(y);

rho = (1-q)/(2-p-q);

pa = rho|(1 - rho);

p1 = zeros(4,1);

/* pax=filtered probas */

/* ------------------------------------------------------------------- */

pax = zeros(n,4);

(S t|S t-1)=(1,1)∼(2,1)∼(1,2)∼(2,2)∼(S t=1)∼(S t-1=1);

/* qax smoothed probas, same structure as pax */

/* ------------------------------------------------------------------- */

pfx = zeros(n,1); @ likelihoods from filter @

/* Calculate probability weighted likelihoods for each obs */

/* ------------------------------------------------------------------- */

yxx = (1/s1)*exp(-0.5* ((y-mu1)/s1)ˆ2 ) ∼ (1/s2)*exp(-0.5* ((y-mu2)/s2)ˆ2 );

yxx = (p*yxx[.,1])∼((1-p)*yxx[.,2])∼((1-q)*yxx[.,1])∼(q*yxx[.,2]);

/* Next call basic filter, store results in pax, pfx */

/* ------------------------------------------------------------------- */

its = 1;

do until its > n;

p1[1] = pa[1]*yxx[its,1];

p1[2] = pa[1]*yxx[its,2];

p1[3] = pa[2]*yxx[its,3];

p1[4] = pa[2]*yxx[its,4];

pfx[its] = sumc(p1);

p1 = p1/pfx[its,1];

pax[its,.] = p1™;

pa[1,1] = p1[1,1] + p1[3,1];

pa[2,1] = p1[2,1] + p1[4,1];

its = its+1;

endo;

/* Smoothed probability estimate */

/* ------------------------------------------------------------------- */

spe = pax[1,.]∼pax[1,.];

spe[1,2]=0; spe[1,4]=0; spe[1,5]=0; spe[1,7]=0;

t = 2;

do until t > n;

spe = (((yxx[t,1]*spe[.,1:4])+(yxx[t,3]*spe[.,5:8]))∼

((yxx[t,2]*spe[.,1:4])+(yxx[t,4]*spe[.,5:8])))/pfx[t,1];

Switching Regime Volatility 211

spe = spe | (pax[t,.] ∼ pax[t,.]);

spe[t,2]=0; spe[t,4]=0; spe[t,5]=0; spe[t,7]=0;

t=t+1;

endo;

spe = spe[.,1:4] + spe[.,5:8];

/* Calculate filtered and smoothed probs that st=1 (col. 5) and st-1 =1 (col. 6) */

/* ------------------------------------------------------------------- */

pax=[pr(st=1|st-1=1) ∼ pr(st=2|st-1=1) ∼ pr(st=1|st-1=2) ∼ pr(st=2|st-1=2)]

pax = pax ∼ (pax[.,1] + pax[.,3]) ∼ (pax[.,1] + pax[.,2]);

qax = spe ∼ (spe[.,1] + spe[.,3]) ∼ (spe[.,1] + spe[.,2]);

retp(pax,qax);

endp;

7

Quantitative Equity Investment Management

with Time-Varying Factor Sensitivities—

YVES BENTZ

ABSTRACT

Factor models are widely used in modern investment management. They enable invest-

ment managers, quantitative traders and risk managers to model co-movements among

assets in an ef¬cient way by concentrating the correlation structure of asset returns into a

small number of factors. Because the factor sensitivities can be estimated by regression

techniques these factors can be used to model the asset returns. Unfortunately, the corre-

lation structure is not constant but evolves in time and so do the factor sensitivities. As

a result, the sensitivity estimates have to be constantly updated in order to keep up with

the changes.

This chapter describes three methods for estimating time-varying factor sensitivities.

The methods are compared and numerous examples are provided. The ¬rst method, based

on rolling regressions, is the most popular but also the least accurate. We show that

this method can suffer from serious biases when the sensitivities change over time. The

second method is based on a weighted regression approach which overcomes some of the

limitations of the ¬rst method by giving more importance to recent observations. Finally,

a Kalman ¬lter-based stochastic parameter regression model is described that optimally

estimates non-stationary factor exposures. The three methods have been implemented in

the software provided on the CD-Rom so that readers can use and compare them with

their own data and applications.

7.1 INTRODUCTION

Are you satis¬ed with the accuracy of your factor sensitivity estimates? If not, perhaps

the following situation will sound familiar. . . After days of careful analysis, John had

constructed a long“short portfolio of stocks. John™s boss, however, felt uncomfortable

about the position as he feared that the expected outperformance, i.e. the alpha, may take

time to materialise and be perturbed by unwanted risk exposures. John updated the risk

model with the latest estimates of factor sensitivities and ran the optimiser in order to

immunise the position against these exposures. After monitoring the pro¬t and loss over

—

The information presented and opinions expressed herein are solely those of the author and do not necessarily

represent those of Credit Suisse First Boston.

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m

±

™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5

214 Applied Quantitative Methods for Trading and Investment

a few days, it became clear that the position was exposed to general market movements.

In fact the alpha was dominated by a market beta.

What went wrong? The optimiser? The trading strategy? Actually, neither of them. The

true underlying factor sensitivities had not been constant over the estimation period and,

as a result, the OLS1 sensitivity estimates that John used for the risk model were seriously

misleading.

While factor sensitivities are known to vary over time, prevailing methods such as

rolling regressions can be severely biased because they estimate past average factor expo-

sures rather than forecast where these exposures are going to be in the future. In contrast,

the adaptive procedure described in this chapter models and predicts the variations of

factor sensitivities instead of merely smoothing past sensitivities. It can therefore be used

to take advantage of the dynamics driving the relationships between stock and factor

returns. As a result, risk models can be more accurate, investment strategies can be better

immunised against unwanted risk, and risk“return pro¬les can be signi¬cantly improved.

While prevailing methods are often too simple to be correct, powerful adaptive pro-

cedures are often too complicated to be usable. This chapter presents in simple terms the

essentials of dealing with time-varying factor sensitivities. It describes and compares the

various methods of modelling time-varying risk exposures. Particular emphasis is given

to an elegant, rich and powerful estimation method based on the Kalman ¬lter.

The software and spreadsheets supplied on the CD-Rom provide the reader with intu-

itive examples of the various procedures and show how most of the complexity can be

handed over to the computer program. The reader may wish to use the estimation tool

pack provided on their own data in order to estimate time-varying regression coef¬cients.

The software is provided for educational purposes only.

7.1.1 Who should read this chapter and what are its possible applications?

This chapter is aimed at all those using linear regression models with economic and

¬nancial time series. This includes areas such as investment analysis, quantitative trading,

hedging, index tracking, investment performance attribution, style management and risk

measurement. In particular, this chapter is targeted at everyone who has heard of adaptive

models, stochastic parameter models or Kalman ¬ltering but could not or did not want to

invest the time and effort in order to implement such models.

This chapter consists of four main sections. Section 7.2 is a short review of factor mod-

els and factor sensitivities. It introduces the notations used in the rest of the chapter. The

following sections describe three different methods for estimating the factor exposures.

The ¬rst method, presented in Section 7.3, is based on a rolling regression procedure

and uses OLS estimation. The procedure is straightforward and can be implemented in

a simple spreadsheet. It does not require any complex estimation procedure and can

use the widely available linear regression models. The shortcomings of the method are

demonstrated. The second method, presented in Section 7.4, is based on weighted least

squares estimation. The procedure rests on a slightly more complex set of equations but

overcomes a number of weaknesses of the OLS procedure. The third method, presented in

Section 7.5, consists of an adaptive procedure based on the Kalman ¬lter. This stochastic

1

Ordinary least squares estimation is the usual linear regression estimation procedure used on rolling windows

to compute betas and other factor sensitivities.

Time-Varying Factor Sensitivities 215

parameter regression model is shown to be the most accurate and robust procedure of the

three, yielding optimal estimates of factor sensitivities and modelling their time structure.

Finally, Section 7.6 concludes the Chapter.

7.2 FACTOR SENSITIVITIES DEFINED

The estimation and use of factor sensitivities play an important role in equity investment

management. Investment diversi¬cation, portfolio hedging, factor betting2 or immunisa-

tion,3 index tracking, performance attribution, style management all necessitate at some

stage an accurate estimation of factor sensitivities. The factors can be the overall market

(some broad stock market index), some industrial sector, some investment style grouping

(index of growth stocks) or other variables that underlie the correlation structure of stock

returns such as macroeconomic factors (e.g. in¬‚ation, GDP growth), statistical factors

(usually obtained by principal component analysis), or returns of other asset classes (e.g.

crude oil, gold, interest rates, exchange rates).4

The sensitivity of a stock5 to a factor is usually de¬ned as the expected stock return

corresponding to a unit change in the factor. If Y (t) is the return6 of the stock at time

t and X(t) is the simultaneous return (or change) of the factor, then β in the following

equation can be viewed as the factor sensitivity:

Y (t) = ± + βX(t) + µ(t) (7.1)

where ± is a constant representing the average extra-factor performance of the stock and

µ(t) is a random variable with zero mean (by construction), constant variance and zero

covariance with X(t).

If both the stock and the factor returns Y (t) and X(t) can be observed in the market

(and they usually can), then ± and β can be estimated by regression techniques. Once

the β coef¬cient has been estimated, it is possible to immunise the investment against

movements in the factor by selling β amount of a tradable proxy for the factor for every

unit of the investment. For instance, if the factor is the UK stock market, one can sell β

pounds of FTSE 100 index futures for each pound invested. The expected return of the

hedged position would then be ±.

More generally, several factor sensitivities can be estimated simultaneously. In order

to estimate joint sensitivities, equation (7.1) is generalised to more than one factor:

N

Y (t) = ± + βi Xi (t) + µ(t)

(7.2)

i=1

= X(t)β + µ(t)

2

For instance, an investor could construct a portfolio that would be sensitive to one factor only (e.g. default

risk) and immune to all other factors.

3

For instance, an investor could construct a portfolio that would not be sensitive to one factor (e.g. long-term

interest rates risk) but remain sensitive to all other factors.

4

Not all factors may be relevant for all factor models. For example, using macroeconomic factors makes little

sense when modelling daily or hourly returns.

5

Sensitivity estimation is obviously not restricted to stocks but can also apply to other assets, portfolios of

assets or investment strategies.

6

As factor sensitivities may not have a ¬‚at term structure, β is a function of the horizon over which returns are

calculated, i.e. betas based on monthly, weekly, daily and hourly returns may all be different from each other.

216 Applied Quantitative Methods for Trading and Investment

where Y (t) is the return of the investment at time t. Xi (t) are the returns of the i

factors. ± is a constant representing the average extra-factor return of the investment. It

is the sensitivity coef¬cient β0 to the constant factor X0 (t) = 1. βi is a parameter that

represents the joint sensitivity of the investment return to changes in factor i. µ(t) is a

random variable with zero mean (by construction), constant variance (by assumption) and

zero covariance with the factor returns.

The joint sensitivity coef¬cients βi measure “clean” sensitivities, accounting for the

sole effect of one variable Xi while controlling for the other effects. Hence, if the joint

sensitivity of Y to X1 has a value of S, Y is expected to change by S when X1 changes

by 1, if all the other variables X2 , . . . , Xn remain constant. βi is the partial derivative

ˆ ˆ

‚ Y (Xi )/‚Xi of the expectation Y of Y with respect to Xi .

7.3 OLS TO ESTIMATE FACTOR SENSITIVITIES: A SIMPLE,

POPULAR BUT INACCURATE METHOD

Estimating factor sensitivities can be simple or complex, depending on the assumptions

made about the relationships between stock and factor returns. The most popular method

with practitioners and, unfortunately, also the least accurate is standard linear regression,

also known as ordinary least squares (OLS) estimation. It is simple, easy to implement and

widely available in statistical software packages. OLS minimises the mean squared error

(MSE) of the linear model described in equation (7.2). The problem has a simple closed

form solution which consists of a matrix inversion and a few matrix multiplications.

Equation (7.2) can be rewritten using a more concise matrix notation, as:

Y = Xβ + µ (7.3)

where Y is a T — 1 vector of the asset returns, X a T — (N + 1) vector of the factor

returns, β a (N + 1) — 1 vector of factor sensitivities and µ a T — 1 vector of random

variables with zero mean, i.e.

® ® ® ®

Y1 1 X11 ... X1N ± µ1

Y 1 X21 ... β µ

...

Y = 2 , X= , β = 1 , µ= 2

°...» °... ... ... » °...» °...»

Xtj

YT 1 XT 1 ... XT N βN µT

with expectation E(µ) = 0 and variance“covariance matrix σ 2 (µ) = σ 2 I. Here Yt is the

tth observation of the investment return; Xti is the tth observation of the ith factor return

Xi ; βi is the sensitivity of Y to factor Xi ; µt is the error term at time t.

ˆ

The expression of the OLS estimator β of β can then be shown to be:

β = (X X)’1 X Y

ˆ (7.4)

ˆ

and its estimated variance“covariance matrix is s 2 (β):

ˆ

s 2 (β) = MSE (X X)’1 (7.5)

Time-Varying Factor Sensitivities 217

with MSE being the mean squared error of the regression model:

T N

1 ˆ ˆ

MSE = (Y (t) ’ Y (t)) Y (t) =

2

βi Xi (t) (7.6)

T ’2 t=1 i=0

OLS estimates of the factor sensitivities are unconditional, i.e. they are average sen-

sitivities over the observation sample. Any variation of the betas over the sample is

not modelled and the resulting error is attributed to sampling, i.e. to the µ term in

equation (7.3). Therefore, when using OLS, the modeller implicitly assumes that the

factor sensitivities are constant over the data sample. Hence, in the case of a determin-

istic relationship where beta would change over time in a predictable way, OLS would

estimate an average beta over the sample with a large prediction error µ.

The problem is illustrated in Figure 7.1. The sensitivity ‚Y /‚X of a dependent variable

(say Y ) to an independent one (say X) is represented on the vertical axis. The actual

sensitivity is not constant but varies over time. Linear regression measures an average (or

unconditional) effect over the estimation data set. In the case of Figure 7.1, this average

sensitivity is close to zero and would probably not be statistically signi¬cant. However,

the actual sensitivity is far from small. For instance, at time t, the true factor sensitivity

is strongly negative although the unconditional model™s prediction is slightly positive.

Parameter estimation is clearly biased here. At time t, the expected value of beta is

quite different from the true value. The model wrongly attributes to the stochastic residual

µ an effect that is in fact largely deterministic. The importance of the estimation error

depends on the variance of the actual sensitivity over the sample (the more volatile the

sensitivity, the more likely sensitivity measured at a particular time differs from the

average sensitivity) and the size of the estimation window.

The example given in the “Sensitivity estimation.xls” ¬le illustrates standard linear

regression (OLS) using Microsoft Excel™s native LINEST function.7 The data set consists

True factor

Factor

sensitivity

sensitivity

Sensitivity

‚Y estimated by

‚X linear OLS

Best estimation for

current sensitivity

t Time

bias

Estimation window

Figure 7.1 Actual sensitivity (in black) and unconditional sensitivity measured by linear OLS

estimation (in white). When the true sensitivity is not constant but conditional on time, OLS

estimation, which measures an unconditional (average) sensitivity over time, is biased. The

amplitude of the bias depends on the variance of the true sensitivity and the length of the

estimation window

7

A more user-friendly linear regression tool is also available through the “Analysis Tool pack” add-in that

comes with Excel.

218 Applied Quantitative Methods for Trading and Investment

of 100 observations of two observable variables, Y and X, representing daily percentage

returns. X and Y are related by a simple time-varying relationship that is mostly deter-

ministic (90% of the variance in Y is related to X):

Y (t) = β(t) — X(t) + µ(t) (7.7)

where β(t) is a linear function of time; µ(t) is a Gaussian random variable (µ = 0%, σ =

0.1%). The actual beta (generated in column B and plotted in white in Figure 7.1) increases

linearly over the period, starting at ’0.49, turning positive at time t = 50 and ¬nishing

at 0.50.

Cell K3 contains the Excel formula for the beta estimated over the entire data set, i.e.

“=LINEST($E$3:$E$102,$C$3:$C$102,TRUE,FALSE)”. Its value is ’0.04. If this value

is used to predict Y based on X using the same data set, the R-square is 2%. Cell K4

contains the formula for the beta estimated over the last 50 observations only. Its value

is 0.23 and the corresponding R-square (using these 50 observations) is 61%.8 Given all

the information available at time t = 100 (i.e. 100 observations of X and Y ), which one

is the right value of beta to use at time t = 100? And at time t = 101?

Actually, neither of the two values is satisfactory as they correspond to past averages

of betas rather than current values or predictions of beta. The true value of beta at

time t = 100 is 0.5 and the best value to use when a new value of X and Y becomes

available at t = 101 would be 0.51. Unfortunately OLS is unsuitable when beta changes

over time because it averages out the variations of beta rather than models its time

structure. Furthermore it is backward looking rather than predictive. In other terms, OLS

is insensitive to the order of the observations. One could scramble the data, OLS would

still yield the same estimate of beta. The potentially useful information contained in the

data chronology is not used.

Intuitively, one can see that the estimates are highly sample-dependent. If beta changes

rapidly, estimates using short time periods should be better than estimates using longer

periods. Unfortunately, by reducing the estimation period, one increases sampling error.

The best window size should correspond to an optimal trade-off between sampling error

and beta variation. On the one hand, if the estimation period is too short, there are not

enough observations to separate the information from the noise, resulting in large sampling

variance. On the other hand, if the estimation period is too long, the current beta may

signi¬cantly differ from its average, resulting in a biased model. This is known as the

bias/variance dilemma.9

Based on this perception, many practitioners use rolling regressions. This consists of

applying a linear regression model to a rolling window of observations. The window is an

ordered subsample of the time series. Figure 7.2 illustrates the rolling window approach.

An example of rolling regression is given in column H of the spreadsheet. The corre-

sponding (out-of-sample) R-square is 39.2%.

The provided sensitivity estimation tool pack in the “SEToolPack.XLS” ¬le can also be

used to estimate rolling regressions. In order to do so, select “Sensitivity estimation” from

8

Note that these R-square ¬gures are in-sample and do not re¬‚ect the out-of-sample performance of the model.

At time t = 75 for instance, the 25 last observations (i.e. 76 ’ 100) are not yet available. And yet we use the

t = 100 estimate of beta in order to compute this R-square.

9

This issue becomes even more important and dif¬cult to resolve when measuring joint sensitivities that change

at different speeds. Which window size should then be used?

Time-Varying Factor Sensitivities 219

W1’7 W2’8

Information

already in W1’7

Time

1 2 3 4 5 6 7 8 9

New

Rolling step

information

Variance in W1’7

New: 1/6

Old: 5/6

Expected autocorrelation: sqrt(5/6) = 0.91

Figure 7.2 Rolling regressions. Sensitivities are estimated over a rolling window (here W1’7 )

of the n last observations (here n = 7). Then, the estimation window is moved forward by a

rolling step of p observations (here p =1) and a new set of factor sensitivities is estimated (here

on W2’8 ). The procedure goes on until the last observation in the data set is reached. Unless

p = n, consecutive windows are not independent. Actually, (n ’ p)/n of the variance in a

window is shared with an adjacent window and only p/n of the variance is new information

Figure 7.3 Using the sensitivity estimation tool pack to estimate rolling regressions

the tools menu. In the main menu (Figure 7.3) enter the Y and X ranges into the corre-

sponding edit boxes (dependent and independent variables respectively), select the range

where you want the computed sensitivities to be saved to (output range),10 select “Linear

rolling regression” option and press “OK” to go to the next dialogue box (Figure 7.4). In

the “Rolling regression” dialogue box, select “no weighting” and a window size.

10

The sensitivity estimation tool pack can be used with any worksheet and is not restricted to the example sheet

provided. For instance, it can be used with the “stock beta estimation” sheet or any proprietary data collected

by the reader in other workbooks.

220 Applied Quantitative Methods for Trading and Investment

Figure 7.4 The rolling regression menu of the tool pack

The example provided in the spreadsheet consists of two factors, X1 and X2 and 115

observations. The dependent variable Y is generated by a time-varying linear relationship

of the factors. The sensitivities and the additive stochastic noise are stored in columns AF

to AI of the example sheet. They have been constructed so that each represents a different

type of time series. The sensitivity to X1 is a continuous function of time, the sensitivity

to X2 is constant except for a large level shift occurring half-way through the sample

and alpha is a slowly growing value. These time series are represented by the white lines

in the graph included in the spreadsheet (Figure 7.5). The grey lines correspond to the

rolling regression estimates. The ¬rst n observations, where n is the window size, are

used to estimate the regression coef¬cients at time n. “Ini. Period” will therefore appear

in the ¬rst n rows.

Rolling regressions, although popular, are still unsuitable for time-varying sensitivity

estimation. They give the illusion that they can handle conditional betas while in reality

they are biased estimators of time-varying sensitivities. This is because OLS linear regres-

sion estimates an average sensitivity over each rolling window. If actual sensitivities vary

over time (e.g. follow a trend) they will depart from their average.

Actual Sensitivities Alpha X1 X2

Sensitivity

1.00

0.50

Time

0.00

’0.50

Figure 7.5 Rolling regression example. The data set is provided in the “Example” sheet of

“SEToolPack.xls” ¬le. It consists of 115 observations. The white lines correspond to the actual

sensitivities. The grey lines are the sensitivities estimated by the rolling regression model

provided in the tool pack. The size of the rolling window is 32 observations

Time-Varying Factor Sensitivities 221

Figure 7.2 shows that sensitivities computed from adjacent windows are highly corre-

lated because they are estimated from data sets that share n ’ p observations, n being the

window size and p the rolling shift between two consecutive windows. Autocorrelation in

rolling sensitivities increases with the window size n. This autocorrelation is independent

of the behaviour of the actual sensitivities. Even for random sensitivities, autocorrelation

√

in estimated sensitivities is expected to be (n ’ p)/p (the correlation squared is the

percentage of variance common to two adjacent windows, i.e. (n ’ p)/p). The measured

effect of a particular event persists as long as it remains in the rolling estimation window.

The consequences of this persistence are threefold:

• “Ghost” effects. If a signi¬cant event occurs on one particular day, it will remain in

the sensitivity series for p further days, where p is the length of the rolling window.

The apparent sensitivity persistence is a measurement artefact and does not necessarily

relate to a true sensitivity shock. The amplitude of the effect depends on both the size

of the rolling window and the variance of the actual sensitivity. Figure 7.6 illustrates

the shadow effect. The estimation bias is a function of the difference between the

two sensitivity levels, their respective durations and the length of the rolling window.

Measured sensitivity remains at this level for another 15 days although actual sensitivity

has returned to its normal level. This is the shadow effect. Rolling regression measures

an effect that may have long disappeared.

• The difference between two consecutive sensitivity estimates is only determined

by two observations. That is the observation entering the rolling window (i.e. the

most recent data point) and the observation leaving the rolling window (i.e. the most

Actual sensitivity

Measured by rolling OLS

Sensitivity

Shadow effect

Estimation bias

1 26 51 76 101 126 151 176

Time

Figure 7.6 Estimation bias and shadow effect in rolling estimation. In this controlled exper-

iment, the actual sensitivity of Y to X is represented by the thin line. It has been chosen constant

over most of the data set at some level (LL) except for a very short period of 5 days during

which it takes an unusually high value (HL). This may be caused, for instance, by incorrect mar-

ket expectations motivated by unfounded rumours. Sensitivity is estimated by OLS regression

over a rolling window. The window is 20 days long. Sensitivity estimates are represented by the

bold line. Rolling estimation clearly underestimates actual sensitivity. At the end of the ¬fth day

of HL, the estimate reaches its maximum value of (5 — HL + 15 — LL)/20

222 Applied Quantitative Methods for Trading and Investment

remote data point). While it is legitimate that the most recent observation affects this

difference, there is no reason why the most remote point should have more in¬‚uence

than any other observation in the window, especially because the size of the window

has been chosen somewhat arbitrarily.

• Sensitivity estimates lag actual sensitivity. Because rolling regression measures an

average sensitivity over the estimation period, estimated betas lag actual betas, especially

when the latter trend. The lag depends on the length of the rolling window. This effect is

clearly visible in Figure 7.5. The beta to variable X2 slowly adapts to the level shift while

the beta to X1 seems to lag the actual beta by about 20 observations.

7.4 WLS TO ESTIMATE FACTOR SENSITIVITIES: A BETTER

BUT STILL SUB-OPTIMAL METHOD

One of the major limitations of rolling OLS is that all observations are given equal

weight, irrespective of their distance in time. Hence, the most recent observation is given

the same “credibility” as the most remote observation in the window. A natural improve-

ment of the rolling window procedure would be to give observations different weights

based on their position in the time series. This procedure is known as weighted least

squares (WLS) estimation and can better deal with variations of sensitivities (see, among

others, Pindyck and Rubinfeld (1998)). Remote observations are attributed less weight

than recent observations.

The criterion to be minimised, WSSR, is the weighted sum of squared residuals rather

than the ordinary sum of squared residuals, SSR. WSSR is de¬ned by:

T

ˆ

WSSR(β) = w(t)(Yβ (t) ’ Y (t))2 = (Y ’ Xβ) W(Y ’ Xβ) (7.8)

t=1

the weights w(t), i.e.

where W is a diagonal matrix containing

®

w(1) ...

0 0

0 ...

w(2) . . .

W =

° ... 0»

... ...

T —T

... 0 w(T )

0

ˆ

The weighted least squares estimator βw is:

ˆ

βw = (X WX)’1 X WY (7.9)

ˆ ˆ

and the estimated variance“covariance matrix of βw is s 2 (βw ):

ˆ

s 2 (βw ) = MSEw (X WX)’1 (7.10)

with MSEw being the weighted squared error of the regression model:

WSSR

MSEw = (7.11)

T

w(t) ’ 2

t=1

Time-Varying Factor Sensitivities 223

Actual Sensitivities Alpha X1 X2

Sensitivity

1.00

0.50

Time

0.00

’0.50

Figure 7.7 Weighted regression example. The size of the rolling window is still 32 observa-

tions. However, here the observations have been weighted using a linear function of time and a

decay rate of 3%

Not surprisingly, when W = I, i.e. when all observations have equal weight, equations (7.8)

to (7.10) can be condensed into the OLS equation. The most popular functions, which are

provided by the tool pack, are linear or exponential but other functions such as sigmoids

can be considered. A decay rate determines the speed at which the weight of an observation

decreases with time.11

In order to use the sensitivity estimation tool pack, select one of the weighting options

in the rolling regression menu (Figure 7.3). For instance, select a linear weighting and

enter a weight decay of 3%. Figure 7.7 shows the result of the estimation.

Weighted least squares estimation induces less autocorrelation in the estimates than

ordinary least squares estimation. Depending on the decay rate, shadow effects, lag and

persistence problems are considerably reduced.

However, WLS does not provide a way to model the sensitivities time series. It still

measures past (weighted) average sensitivities rather than predicting future ones. In addi-

tion, all sensitivity coef¬cients in the regression equation are identically affected by the

weighting, regardless of their rate of change as the weights only depend on the position of

the observation in the time series. Consequently, constant sensitivities suffer from large

weight discount rates while highly variable sensitivities suffer from small decay rates.

There is no single weight discount rate that is adapted to all factor sensitivity coef¬cients

when their variances differ and some trade-off has to be made.

7.5 THE STOCHASTIC PARAMETER REGRESSION MODEL

AND THE KALMAN FILTER: THE BEST WAY TO ESTIMATE

FACTOR SENSITIVITIES

The procedures that have been described so far involve a single regression equation

with constant betas. These procedures use ordinary or weighted least squares in order to

repeatedly estimate new model coef¬cients from adjacent windows of observations.

The stochastic parameter model, however, is based on a conceptually different approach

(see Gouri´ roux et al. (1997), or Harvey (1989)). The beta coef¬cients are not assumed

e

11

Although this rate is usually set a priori, one could also estimate it, for instance by minimising the WSSR.

224 Applied Quantitative Methods for Trading and Investment

constant and are optimally adjusted when new information becomes available. The dynam-

ics of the betas are modelled in a second equation. Very much like a GARCH model,12

the stochastic parameter regression model is based on a system of equations. Hence, the

linear regression equation (7.2) can be rewritten into a simple time-varying regression by

letting β follow a given time process, for example an autoregressive process:13

Yt = Xt βt + µt (7.12)

and

βt = β t’1 + ·t for t = 1, . . . , T (7.13)

where is a non-random K — K matrix, K being the number of factors; ·t is a vector

of serially uncorrelated disturbances with zero mean. It is not observable.

The addition of the second equation (7.13) allows beta to vary in time according to

and ·t are not observable and need to

a process that can be modelled. However,

be estimated. Simple estimation procedures such as OLS which can only handle single

equations cannot be used here but fortunately, the problem can be resolved as it can easily

be put into a state space form. The Kalman ¬lter can then be used to update the parameters

of the model. Moreover, the same Kalman ¬lter can be employed with a large variety of

time processes for the sensitivities without adding much computational complexity.14

Originally developed by control engineers in the 1960s (see Kalman (1960)) for appli-

cation concerning spacecraft navigation and rocket tracking, state space models have since

attracted considerable attention in economics, ¬nance, and the social sciences and have

been found to be useful in many non-stationary problems and particularly for time series

analysis. A state space model consists of a system of equations aimed at determining the

state of a dynamic system from observed variables (usually time series) contaminated

by noise. Time, or the ordering of observations, plays an important role in such models.

Two identical state space models applied to the same data are likely to produce different

estimates if observations are presented to the models in different orders (i.e. scrambling

observations alters model estimates). It is therefore not surprising that they are usually

used for modelling time series. The state of the dynamic system, described by “state

variables”, is assumed to be linearly related to the observed input variables.

The stochastic coef¬cient regression model expressed in a state space form can be

de¬ned by a system of two equations. The ¬rst equation (e.g. equation (7.12)) is referred to

as the observation equation and relates the dependent variable, Yt , to the independent vari-

ables by the unobservable states (in the case of equation (7.12), the state is simply the vec-

tor of sensitivities βt ). More generally, the observation equation takes the following form:

Yt = Ht (st + d) + µt for t = 1, . . . , T (7.14)

12

See for instance Bollerslev (1986).

13

Equation 7.13 describes one example of a time process that can be used for modelling the sensitivities time

series. This particular process corresponds to a simple AR1. The Kalman ¬lter approach can be used with many

more processes.

If ·t has zero variance and = I, then βt is constant and equation (7.12) is just the familiar equation of a

14

linear regression. The estimates of βt are those of a recursive regression (OLS linear regression using all the

observations from i = 1 to t) and the estimate of βT is identical to the OLS estimate of a linear regression. If

the variance of ·t is larger than zero, however, sensitivity coef¬cients are allowed to change over time.

Time-Varying Factor Sensitivities 225

where Yt is the dependent observable variable (e.g. an asset return) at time t. st is a K

vector that describes the state of the factor sensitivities of the asset returns at time t. d is

a K non-random vector that can account for the long-term mean in sensitivities. Ht is a

K vector that contains the factor values at time t. µt is a serially uncorrelated perturbation

with zero mean and variance Rt , i.e.: E(µt ) = 0 and Var(µt ) = Rt , Rt being non-random.

It is not observable.

The exact representation of Ht and st will depend on model speci¬cation. For the

autoregressive model described by equations (7.12) and (7.13), which is also the most

popular, Ht and st simply correspond to Xt and βt respectively. However, this is not

the case for more complex models such as the random trend model which requires more

state variables (K = 2N , N being the number of factors). In general, the elements of st

are not observable. However, their time structure is assumed to be known. The evolution

through time of these unobservable states is a ¬rst-order Markov process described by a

transition equation:

st = s t’1 + ct + ·t for t = 1, . . . , T (7.15)

where is a non-random K — K state transition matrix; ct is a non-random K vector; ·t

is an m — 1 vector of serially uncorrelated disturbances with zero mean and covariance

Qt ; E(·t ) = 0 and Var(·t ) = Qt , Qt being non-random. It is not observable.

The initial state vector s0 has a mean of s and a covariance matrix P0 , i.e.:

E(s0 ) = s Var(s0 ) = P0

and

Furthermore, the disturbances µt and ·t are uncorrelated with each other in all time

periods, and uncorrelated with the initial state, i.e.:

E(µt , ·t ) = 0 for all elements ·t of ·t , and for t = 1, . . . , T

and

E(µt , s) = 0 and E(·t , s) = 0 for t = 1, . . . , T

The system matrices , P and Q, the vectors ·, c and d and the scalar R are non-random

although some of them may vary over time (but they do so in a predetermined way).

Furthermore, the observation noise µt and the system noise ·t are Gaussian white noises.

The most popular processes used for time-varying sensitivities are the random walk

model and the random trend model.

The random walk model dictates that the best estimate of future sensitivities is the

current sensitivity. To specify this model, vectors d and c are set to zero and the system

matrices for the random walk model are given by equations (7.16):15

st = [βt,1 , βt,2 , . . . , βt,N ]

Ht = [1, Ft,1 , . . . , Ft,N’1 ] (7.16)

=I

15

Hence the random walk speci¬cation of the stochastic regression model can be expressed as a system of two

equations: (1) Yt = Xt βt + µt and (2) βt = βt’1 + ·t .

226 Applied Quantitative Methods for Trading and Investment

is N — N , Ht is 1 — N and st is N — 1, N being the number of independent

where

variables, including the constant.

The random trend model dictates that the best estimate of future sensitivities is the

current sensitivity plus the trend. In the presence of a trend, sensitivities at t + 1 are not

equally likely to be above or under the value at t. A simple model which allows the

sensitivities to trend is the random trend model. A sensitivity coef¬cient βit follows a

random trend if:

βt = βt’1 + δt’1 + ·t,1

(7.17)

δt = δt’1 + ·t,2

where βt represents the sensitivity vector at time t and δt is the random trend in the

sensitivity vector at time t.16

In a state space form, the system in (7.17) can be written as:

βt βt’1 ·t,1

= + (7.18)

δt δt’1 ·t,2

0

where the individual sensitivity random trend state transition matrix is given by

11

0= .

01

To specify this model, vectors d and c are set to zero and the collection of factor

sensitivities and trends are expressed as:

s(t) = [β1 (t), δ1 (t), β2 (t), δ2 (t), . . . , βN (t), δN (t)] (7.19)

H(t) = [1, 0, 1, 0, . . . , 1, 0] (7.20)

®

0 ··· 0

0

0 0

0

= . (7.21)

..

°. »

.

.

0 0 0

where the dimensions of the state of sensitivities has been doubled to include trends in

the sensitivities, i.e. is 2N — 2N , st is 2N — 1 and Ht is 1 — 2N .

Other time processes such as the random coef¬cient model (Schaefer et al., 1975) or

the mean-reverting coef¬cient model (Rosenberg, 1973) can also be used within this

modelling framework.

The objective of state space modelling is to estimate the unobservable states of the

dynamic system in the presence of noise. The Kalman ¬lter is a recursive method of

doing this, i.e. ¬ltering out the observation noise in order to optimally estimate the state

vector at time t, based on the information available at time t (i.e. observations up to

and including Yt ). What makes the operation dif¬cult is the fact that the states (e.g. the

sensitivities) are not constant but change over time. The assumed amount of observation

noise versus system noise is used by the ¬lter to optimally determine how much of the

variation in Yt should be attributed to the system, and how much is caused by observation

16

Hence the random trend speci¬cation of the stochastic regression model can be expressed as a system of

three equations: (1) Yt = Xt βt + µt , (2) βt = βt’1 + δt’1 + ·t,1 and (3) δt = δt’1 + ·t,2 .

Time-Varying Factor Sensitivities 227

noise. The ¬lter consists of a system of equations which allows us to update the estimate

of the state st when new observations become available.

Equations (7.22) to (7.31) describe the Kalman ¬lter.17 These equations are important

to the more technical readers who want to develop and implement the model. Other

readers may want to skip these equations and directly move to the more intuitive example

given later.

ˆ

The ¬rst two equations de¬ne the state st|t (equation (7.22)) and the state error covari-

ance matrix Pt|t (equation (7.23)):

ˆ

st|t = E(st |Y1 , . . . , Yt ) (7.22)

ˆ ˆ ˆ

Pt|t = E((st ’ st|t )(st ’ st|t ) ) = Var(st ’ st|t ) (7.23)

ˆ

where the notation ab|c denotes an estimate of a at time b conditional on the information

available at time c.

ˆ

The predicted state of sensitivities st|t’1 and the corresponding forecasting error

ˆ ˆ

covariance matrix Pt|t’1 = E((st ’ st|t’1 )(st ’ st|t’1 ) ) are given by the prediction

equations (7.24) and (7.25):

st|t’1 = st’1|t’1 + ct

ˆ ˆ (7.24)

Pt|t’1 = + Qt

Pt’1|t’1 (7.25)

and the predicted dependent variable is given by equation (7.26):

ˆ

Yt|t’1 = Ht st|t’1 + d (7.26)

ˆ

The forecast error et at time t and its variance ft can be calculated by equations (7.27)

and (7.28) respectively:

ˆ

et = Yt ’ Yt|t’1 (7.27)

ft|t’1 = + Rt

Pt|t’1 (7.28)

The new information is represented by the prediction error, et . It can be the result of

several factors: random ¬‚uctuations in returns, changes in underlying states, or error in

ˆ

previous state estimates. Given this new information, the estimate st|t of the state vector

and its covariance Pt|t can now be updated through the updating equations (7.29) and

(7.30) respectively. The Kalman ¬lter uses the new information for adjusting the estimates

of the underlying states, where the new information is simply the prediction error of the

returns. The Kalman gain matrix, Kt , optimally adjusts the state estimates in order to

re¬‚ect the new information:

ˆ ˆ

st|t = st|t’1 + Kt et (7.29)

Pt|t = (I ’ Kt Ht )Pt|t’1 (7.30)

The Kalman gain matrix is calculated from the observation noise variance Rt and the

predicted state error covariance Pt|t’1 with the recursive equation (7.31):

Pt|t’1 Ht (Ht Pt|t’1 Ht + Rt )’1

Kt = (7.31)

17

Further explanations can be found in Harvey (1989) or Gouri´ roux et al. (1997).

e

228 Applied Quantitative Methods for Trading and Investment

Non-random values of:

Initialisation: t = 1

¦, d, ct , Rt , Qt

s1|0 = s, P1\0 = P0

Yt|t ’ 1 = Ht st|t ’ 1 + d

Predict Y:

et = Yt ’ Yt|t ’ 1

Compute error:

Kt = Pt|t ’ 1H′ (Ht Pt|t ’ 1H′ + Rt )’1

Compute gain: t t

t=t+1

st|t = st|t ’ 1 + Kt et

Updating

Observation

of Yt and Ht

Pt|t = (I ’ Kt H′ )Pt|t ’ 1

equations: t

st|t ’ 1 = ¦st ’ 1|t ’ 1 + ct

Prediction

Pt|t ’ 1 = ¦Pt ’ 1|t ’ 1¦′ + Qt

equations:

No

t=T?

Yes

Stop

Figure 7.8 Flowchart of the Kalman ¬lter

In order to clarify the sequence of the Kalman ¬lter equations, Figure 7.8 presents a

¬‚owchart for the ¬lter.

The recursive nature of the Kalman ¬lter is a major computational advantage. It enables

the model to update the conditional mean and covariance estimates of the states at time

t based on the sole estimate obtained at time t ’ 1. Although it takes into account the

entire history, it does not need an expanding memory.

For an intuitive understanding of the principles involved, let us consider the example

of the time-varying sensitivity (which follows a random walk) of a dependent variable Yt

to a single independent variable Xt . The familiar regression equation is:

Yt = bt Xt + µt for t = 1, . . . , T (7.32)

which can be represented in a state space form, using the random walk speci¬cation:

Yt = bt Xt + µt

for t = 1, . . . , T (7.33)

bt = bt’1 + ·t

with Var(µt ) = R and Var(·t ) = Q. We can recognise the observation and transition

equations (7.14) and (7.15) with st = bt , Ht = Xt , = 1, d = 0, ct = 0.

Upon presentation of a new observation Xt of X, the model expects to observe a value

bt|t’1 Xt of Y based on its predicted state18 bt|t’1 . However, the observed value Yt of the

18

As the sensitivity coef¬cient b is assumed to follow a random walk the predicted state is equal to the last

estimate bt’1|t’1 of the state resulting from all available information at time t ’ 1.

Time-Varying Factor Sensitivities 229

dependent variable Y is likely to be different from the model™s expectation. This forecast

error may result from two sources:

• Some unpredictable force temporarily affects the dependent variable. This random noise

should not affect the state.

• The true value of the sensitivity of Y to X has changed (structural change). This is not

a temporary disturbance and affects the state.

The purpose of the Kalman ¬lter is to attribute some share of the prediction error to each

of these two sources, i.e. separate the signal from the noise. The relative magnitude of the

system noise variance Q and the observation noise variance R is therefore an important

parameter of the model. The following equations correspond to the various stages of the

Kalman ¬lter for the random walk speci¬cation.19

The prediction equations:

ˆ ˆ

bt|t’1 = bt’1|t’1

(7.34)

Pt|t’1 = Pt’1|t’1 + Q

The updating equations:

ˆ ˆ ˆ

bt|t = bt|t’1 + kt (Yt ’ bt|t’1 Xt )

Pt|t = (1 ’ kt )Pt|t’1

(7.35)

Pt|t’1 Xt2

kt =

Pt|t’1 Xt2 + R

ˆ

The sensitivity estimate bt|t is updated by taking into account the forecast error. The

fraction of forecast error that is added to the previous estimate of b is the Kalman

gain kt . Its value is in the interval [0,1], with zero corresponding to Q = 0 and one

corresponding to R = 0. The Kalman gain depends on the relative value of observation

and system noises and on the estimated variance of the state. For small values of kt (large

observation noise R compared to system noise Q and/or small uncertainty about the state

estimate), considerable credibility is given to the previous sensitivity and as a result to

remote observations. The sensitivity coef¬cient bt evolves smoothly over time.

In contrast, if kt takes larger values (i.e. large system noise Q compared to observation

noise R and/or large uncertainty about the state estimate), then more credibility is given

to recent observations and therefore less weight is given to recent sensitivity estimates.

In this case, the sensitivity coef¬cient bt evolves quickly over time and its volatility

increases.

In order to understand well the effect of the ratio between system and observation noise

variance, let us consider an example where the sensitivity coef¬cient, beta, is constant

over the data set except for a short time period (5 days) during which it takes an unusually

high value. This pattern may be created, for example, by incorrect market expectations

motivated by unfounded rumours. A Kalman ¬lter is used to estimate the time-varying

sensitivity. Three different signal to noise ratios (Q/R) are applied. For a ratio of 1

ˆ

19

The notation ab|c denotes an estimate of a at time b conditional on the information available at time c.

230 Applied Quantitative Methods for Trading and Investment

(Figure 7.9), the model rapidly adapts to the jump in sensitivity. However, this quick

reaction comes at a cost of increased volatility in the beta estimates and of a large

standard error. If the signal to noise ratio is decreased to a value of 0.1 (Figure 7.10), the

beta estimates become smoother and the con¬dence bands narrower. The model seems to

be more reliable, although it does not adapt quickly enough to the shift in beta occurring

at time t = 90. If Q/R is set to a small value, such as 0.02 (Figure 7.11), beta estimates

become very smooth and the con¬dence bands become very small. However, with such a

small ratio, the model does not manage to properly track the sensitivity when it jumps to

a high level. It is also interesting to see how the standard error, i.e. Pt|t , decreases after

a few observations. This is because the initial value of Pt|t’1 (i.e. P1|0 ) was deliberately

chosen to be large so that the system could rapidly adjust the value of beta in order to

match the observations.

This model is available in the sensitivity estimation tool pack. Select “Stochastic par-

ameter regression” in the main menu to enter the submenu displayed in Figure 7.12. Select

“Random walk”. Make sure that the “Estimate parameters” check box is unchecked.

The stochastic parameter regression menu proposes both a random walk model and a

random trend model for the underlying sensitivities. The system and observation noise

variance can be set by the user (option B), in which case a reference to the cells containing