<<

. 11
( 19)



>>

Column F computes the trading signal as described in Sections 6.4.6.3 and 6.4.6.4 above.
The volatility threshold is input in cell C4. Columns G to N contain the trading simulation
formulas; in particular, columns I, J, K and L contain the pro¬t and loss calculations, while
columns M and N compute the drawdowns. The pro¬t and loss ¬gures are computed
“open” in column I (i.e. how much pro¬t/loss is potentially realised by the strategy
each day) and “closed” in column J (i.e. how much pro¬t/loss is effectively realised
at the close of each trade). Columns K and L are merely cumulative computations of
column J and I and give the cumulative pro¬t and loss ¬gures effectively and potentially
realised respectively. The computation of the drawdowns (i.e. the differences between the
maximum pro¬t potentially realised to date and the pro¬t or loss realised to date) are
computed in columns M and N. Lastly, the trading summary statistics are computed in
cells I2 to I6.
The second worksheet (“Graph”) contains the graph showing the open pro¬t and loss.
The third worksheet (“Backup”) contains the raw data, i.e. columns A to D contain the
forex data from Reuters (at 22:00 GMT) with the bid and ask quotes.
Cells C3 and C4 allow the user to simulate the performance of different scenarios
depending on the EMA operator and the volatility threshold value respectively selected.
The summary results of the trading strategy are displayed in cells I2:I6.
Figure 6.4 displays the daily pro¬t and loss trajectories for the two hedging strat-
egies analysed in this chapter. During the ¬rst period (until mid-1997), the EMA without
volatility ¬lter dominates the other strategy. As of mid-1997, the volatility ¬lter strategy
clearly outperforms in terms of pro¬t and risk. The chart also shows that the equity
curve of the volatility ¬lter strategy is less volatile, thus leading to lower risk. The ¬gure
also suggests a somewhat poor performance for the EMA strategy without volatility
¬lter.
206 Applied Quantitative Methods for Trading and Investment

20 %
With volatility filter
Without volatility filter
15 %

10 %


5%


0%


’5 %

’10 %


’15 %
5

96


6

6


6

97


7

7


7

98


8

8
-9




r-9

l-9


-9




r-9

l-9


-9




r-9

l-9
n-




n-




n-
ct




ct




ct
Ju




Ju




Ju
Ap




Ap




Ap
Ja




Ja




Ja
O




O




O
Figure 6.4 Daily cumulative pro¬t and loss curves


6.5 CONCLUSION
In this chapter we have studied the behaviour of a Markovian switching volatility model
for the USD/DEM daily foreign exchange series.
The analysis of the residuals shows that the heteroskedasticity of the initial dataset has
been removed although the residuals series still presents high excessive kurtosis and fails
to pass two standard normality tests.
When applied within a hedging framework, the ¬ltered probabilities computed with
the Markovian switching model enable us to improve the performance of trend-following
systems both in terms of risk and absolute pro¬ts. This empirical result denotes that
foreign exchange markets do not follow trends in a systematic way and that volatility
modelling could be one interesting approach to classify market trend dynamics.
As a further extension, an economic interpretation of these results should be considered.
The existence of long-lasting trending and jagged periods in very liquid markets are issues
that deserve more attention.

REFERENCES
Baillie, R. and T. Bollerslev (1989), “Intra-Day and Inter-Market Volatility in Exchange Rates”,
Review of Economic Studies, 58, 565“85.
Bera, A. K. and M. L. Higgins (1993), “ARCH Models: Properties, Estimation and Testing”, Jour-
nal of Economic Surveys, 7, 305“66.
Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroskedasticity”, Journal of
Econometrics, 31, 307“27.
Bollerslev, T., R. Chou and K. Kroner (1992), “ARCH Modeling in Finance: A Review of the
Theory and Empirical Evidence”, Journal of Econometrics, 52, 5“59.
Switching Regime Volatility 207
Bollerslev, T., R. Engle and D. Nelson (1993), “ARCH Models”, in R. F. Engle and D. McFadden
(eds), Handbook of Econometrics, Vol. 4, North-Holland, Amsterdam.
Campbell, J. Y., A. W. Lo and A. C. MacKinlay (1997), The Econometrics of Financial Markets,
Princeton University Press, Princeton, NJ.
Chesnay, F. and E. Jondeau (2001), “Does Correlation Between Stock Returns Really Increase
During Turbulent Periods”, Economic Notes, 30 (1), 53“80.
Dueker, M. J. (1997), “Markov Switching in GARCH Processes and Mean-Reverting Stock Market
Volatility”, Journal of Business & Economic Statistics, 15, 26“34.
Engel, C. (1994), “Can the Markov Switching Model Forecast Exchange Rates?”, Journal of Inter-
national Economics, 36, 151“65.
Engel, C. and J. D. Hamilton (1990), “Long Swings in the Dollar: Are They in the Data and Do
Markets Know It?”, American Economic Review, 80, 689“713.
Engle, R. (1982), “Autoregressive Conditional Heteroskedasticity with Estimates of the Variances
of U.K. In¬‚ation”, Econometrica, 50, 987“1008.
Engle, R. (1995), ARCH Selected Readings, Oxford University Press, Oxford.
Ghysels, E., C. Gouri´ roux and J. Jasiak (1998), “High Frequency Financial Time Series Data:
e
Some Stylized Facts and Models of Stochastic Volatility”, Chapter III.7 in C. Dunis and B. Zhou
(eds), Nonlinear Modelling of High Frequency Financial Time Series, John Wiley, Chichester.
Goldfeld, S. M. and E. Quandt (1973), “A Markov Model for Switching Regressions”, Journal of
Econometrics, 1, 3“16.
Goldfeld, S. M. and E. Quandt (1975), “Estimation in a Disequilibrium Model and the Value of
Information”, Journal of Econometrics, 3, 325“48.
Goodhart, C. and L. Figliuoli (1991), “Every Minute Counts in Financial Markets”, Journal of
International Money and Finance, 10, 23“52.
Goodwin, T. H. (1993), “Business-Cycle Analysis with a Markov Switching Model”, Journal of
Business & Economic Statistics, 11, 331“39.
Gouri´ roux, C. (1997), ARCH Models and Financial Applications, Springer-Verlag, New York.
e
Gray, S. F. (1996), “Modeling the Conditional Distribution of Interest Rates as a Regime-Switching
Process”, Journal of Financial Economics, 42, 27“62.
Guillaume, D., M. Dacorogna, R. Dav´ , U. M¨ ller, R. Olsen and O. Pictet (1997), “From the Bird™s
e u
Eye to the Microscope: A Survey of New Stylized Facts of the Intra-Daily Foreign Exchange
Markets”, Finance and Stochastics, 1, 95“129.
Hamilton, J. D. (1988), “Rational-Expectations Econometric Analysis of Changes in Regimes: An
Investigation of the Term Structure of Interest Rates”, Journal of Economic Dynamics and Con-
trol, 12, 385“423.
Hamilton, J. D. (1989), “A New Approach to the Economic Analysis of Nonstationary Timeseries
and the Business Cycle”, Econometrica, 57, 357“84.
Hamilton, J. D. (1994), Time Series Analysis, Princeton University Press, Princeton, NJ.
Hamilton, J. D. and G. Lin (1996), “Stock Market Volatility and the Business Cycle”, Journal of
Applied Econometrics, 11, 573“93.
Kim, C. J. and C. R. Nelson (1998), “Business Cycle Turning Points, a New Coincident Index,
and Tests of Duration Dependence Based on a Dynamic Factor Model with Regime Switching”,
Review of Economics and Statistics, 80, 188“201.
Lam, P. S. (1990), “The Hamilton Model with a General Autoregressive Component”, Journal of
Monetary Economics, 26, 409“32.
M¨ ller, U. (1995), “Specially Weighted Moving Averages with Repeated Application of the EMA
u
Operator”, Internal Paper, O&A Research Group.
M¨ ller, U., M. Dacorogna, D. Dav´ , R. Olsen, O. Pictet and J. Von Weizs¨ cker (1997), “Volatilities
u e a
of Different Time Resolutions “ Analyzing the Dynamics of Market Components”, Journal of
Empirical Finance, 4 (2 & 3), 213“40.
Quandt, R. E. (1958), “The Estimation of Parameters of a Linear Regression System Obeying Two
Separate Regimes”, Journal of the American Statistical Association, 53, 873“80.
Ramchand, L. and R. Susmel (1998), “Volatility and Cross Correlation Across Major Stock Mar-
kets”, Journal of Empirical Finance, 5, 397“416.
208 Applied Quantitative Methods for Trading and Investment
Schnidrig, R. and D. W¨ rtz (1995), “Investigation of the Volatility and Autocorrelation Function of
u
the USD/DEM Exchange Rate on Operational Time Scales™, Proceedings of the High Frequency
Data in Finance Conference (HFDF-I), Z¨ rich.
u


APPENDIX A: GAUSS CODE FOR MAXIMUM LIKELIHOOD
FOR VARIANCE SWITCHING MODELS
The code is made up of a core program (MLS.PRG) that contains sub-procedures (i) proc1
that computes the maximum likelihood theta (θ ) and (ii) proc2 that computes the ¬ltered
and smoothed probabilities.
The comments are framed with the following signs: /* comment */.

MLS.PRG
/* MLS.PRG Maximum likelihood code for model with changing variances */

/* Part I : load the libraries and source files */
/* ------------------------------------------------------------------- */

library maxlik,pgraph;
#include maxlik.ext;
maxset;
graphset;
#include maxprtm.src;

/* external source file see details here below */
#include smooth.src;

/* Part II : load the log returns data vector rt */
/* ------------------------------------------------------------------- */

open fin=filename for read;
x=readr(fin, nb of obs in the rt vector);
T=rows(x);
dat=x[1:T-1,1];
rt=100*x[1:T-1,2];

/* Part III : initialisation of parameters */
/* ------------------------------------------------------------------- */
A=0.0;
S1=0.6;
S2=0.4;
p=0.8; p=-ln((1-p)/p);
q=0.2; q=-ln((1-q)/q);

b0=A|S1|S2|p|q;
let b0={
0.001
1.36
0.60
2.00
3.00
}; b0=b0™;

x=rt;
T=rows(x);

output file=out.out reset;
output file=out.out off;
Switching Regime Volatility 209

/* Part IV : this procedure computes the vector of likelihood evaluated theta (θ ) */
/* ----------------------------------------------------------------------------- */

proc (1)=switch(theta,Rt);
local A,S1,S2,p,q,PrSt,St,j,ft,aux,p0,x,BP,mu,sig,K,auxo,rho;
local l,maxco,auxn, fRtSt, PrStRtM, PrStRt, fRt, t, BigT, PStat,const;

A=theta[1];
S1=ABS(theta[2]);
S2=ABS(theta[3]);
x=theta[4]; p=exp(x)/(1+exp(x));
x=theta[5]; q=exp(x)/(1+exp(x));

BP=(p∼(1-p))|((1-q)∼q);
mu=A;
sig=S1|S2;
K=2;
BigT=rows(rt);

rho=(1-q)/(2-p-q);
Pstat=rho∼(1-rho);

const=1/sqrt(2*Pi);
fRtSt=const*(1./sig™).*exp(-0.5*( ((Rt-mu)./sig™)ˆ2 ));
PrStRtM=Pstat|zeros(BigT-1,K);
PrStRt=zeros(BigT,K);
fRt=zeros(BigT,1);

t=2;
do until t>=BigT+1;
aux=fRtSt[t-1,.].*PrStRtM[t-1,.];
PrStRt[t-1,.]=aux/(sumc(aux™)™);
PrStRtM[t,.]=PrStRt[t-1,.]*BP;
t=t+1;
endo;

fRt[1,.]=fRtSt[1,.]*Pstat™;
fRt[2:BigT]=sumc( (fRtSt[2:BigT,.].*PrStRtM[2:BigT,.])™ );

retp(ln(fRt[1:BigT]));
endp;

/* Part V : maximum likelihood Evaluation(BHHH optimisation algorithm) */
/* ------------------------------------------------------------------- */
max Algorithm=5; /* 5= BHHH Algorithm */
max CovPar=2; /* Heteroskedastic-consistent */
max GradMethod=0;
max LineSearch=5; /* 5= BHHH Algorithm */

{th,f,g,h,retcode}=maxlik(x,0,&switch,b0);
output file=out.out reset;

call maxprtm(th,f,g,h,retcode,2);

/* Part VI : call the routine smooth.src that computes the filtered probabilities
(fpe) and the smooth probabilities (spe) */
/* -------------------------------------------------------------------------- */

{fpe,spe}=smooth(th[1]|th,x);
output file=out.out off;
210 Applied Quantitative Methods for Trading and Investment

/* Part VII : this procedure computes filtered & smoothed probability estimates */
/* --------------------------------------------------------------------- */

proc (2)=smooth(th,y);
local mu1,mu2,p,q,s1,s2,rho,pa,pfx,its,p1,ind,thn,yxx,fk,t,spe,pax,qax,n;

/* Data initialisation */
/* ------------------------------------------------------------------- */

mu1 = th[1];
mu2 = th[2];
s1 = abs(th[3]);
s2 = abs(th[4]);
p = th[5]; p=exp(p)/(1+exp(p));
q = th[6]; q=exp(q)/(1+exp(q));
n = rows(y);
rho = (1-q)/(2-p-q);
pa = rho|(1 - rho);
p1 = zeros(4,1);

/* pax=filtered probas */
/* ------------------------------------------------------------------- */

pax = zeros(n,4);
(S t|S t-1)=(1,1)∼(2,1)∼(1,2)∼(2,2)∼(S t=1)∼(S t-1=1);

/* qax smoothed probas, same structure as pax */
/* ------------------------------------------------------------------- */

pfx = zeros(n,1); @ likelihoods from filter @

/* Calculate probability weighted likelihoods for each obs */
/* ------------------------------------------------------------------- */

yxx = (1/s1)*exp(-0.5* ((y-mu1)/s1)ˆ2 ) ∼ (1/s2)*exp(-0.5* ((y-mu2)/s2)ˆ2 );
yxx = (p*yxx[.,1])∼((1-p)*yxx[.,2])∼((1-q)*yxx[.,1])∼(q*yxx[.,2]);

/* Next call basic filter, store results in pax, pfx */
/* ------------------------------------------------------------------- */

its = 1;
do until its > n;
p1[1] = pa[1]*yxx[its,1];
p1[2] = pa[1]*yxx[its,2];
p1[3] = pa[2]*yxx[its,3];
p1[4] = pa[2]*yxx[its,4];
pfx[its] = sumc(p1);
p1 = p1/pfx[its,1];
pax[its,.] = p1™;
pa[1,1] = p1[1,1] + p1[3,1];
pa[2,1] = p1[2,1] + p1[4,1];
its = its+1;
endo;

/* Smoothed probability estimate */
/* ------------------------------------------------------------------- */

spe = pax[1,.]∼pax[1,.];
spe[1,2]=0; spe[1,4]=0; spe[1,5]=0; spe[1,7]=0;
t = 2;
do until t > n;
spe = (((yxx[t,1]*spe[.,1:4])+(yxx[t,3]*spe[.,5:8]))∼
((yxx[t,2]*spe[.,1:4])+(yxx[t,4]*spe[.,5:8])))/pfx[t,1];
Switching Regime Volatility 211
spe = spe | (pax[t,.] ∼ pax[t,.]);
spe[t,2]=0; spe[t,4]=0; spe[t,5]=0; spe[t,7]=0;
t=t+1;
endo;

spe = spe[.,1:4] + spe[.,5:8];

/* Calculate filtered and smoothed probs that st=1 (col. 5) and st-1 =1 (col. 6) */
/* ------------------------------------------------------------------- */

pax=[pr(st=1|st-1=1) ∼ pr(st=2|st-1=1) ∼ pr(st=1|st-1=2) ∼ pr(st=2|st-1=2)]
pax = pax ∼ (pax[.,1] + pax[.,3]) ∼ (pax[.,1] + pax[.,2]);
qax = spe ∼ (spe[.,1] + spe[.,3]) ∼ (spe[.,1] + spe[.,2]);

retp(pax,qax);

endp;
7
Quantitative Equity Investment Management
with Time-Varying Factor Sensitivities—

YVES BENTZ


ABSTRACT
Factor models are widely used in modern investment management. They enable invest-
ment managers, quantitative traders and risk managers to model co-movements among
assets in an ef¬cient way by concentrating the correlation structure of asset returns into a
small number of factors. Because the factor sensitivities can be estimated by regression
techniques these factors can be used to model the asset returns. Unfortunately, the corre-
lation structure is not constant but evolves in time and so do the factor sensitivities. As
a result, the sensitivity estimates have to be constantly updated in order to keep up with
the changes.
This chapter describes three methods for estimating time-varying factor sensitivities.
The methods are compared and numerous examples are provided. The ¬rst method, based
on rolling regressions, is the most popular but also the least accurate. We show that
this method can suffer from serious biases when the sensitivities change over time. The
second method is based on a weighted regression approach which overcomes some of the
limitations of the ¬rst method by giving more importance to recent observations. Finally,
a Kalman ¬lter-based stochastic parameter regression model is described that optimally
estimates non-stationary factor exposures. The three methods have been implemented in
the software provided on the CD-Rom so that readers can use and compare them with
their own data and applications.

7.1 INTRODUCTION
Are you satis¬ed with the accuracy of your factor sensitivity estimates? If not, perhaps
the following situation will sound familiar. . . After days of careful analysis, John had
constructed a long“short portfolio of stocks. John™s boss, however, felt uncomfortable
about the position as he feared that the expected outperformance, i.e. the alpha, may take
time to materialise and be perturbed by unwanted risk exposures. John updated the risk
model with the latest estimates of factor sensitivities and ran the optimiser in order to
immunise the position against these exposures. After monitoring the pro¬t and loss over


The information presented and opinions expressed herein are solely those of the author and do not necessarily
represent those of Credit Suisse First Boston.

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m
±
™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5
214 Applied Quantitative Methods for Trading and Investment

a few days, it became clear that the position was exposed to general market movements.
In fact the alpha was dominated by a market beta.
What went wrong? The optimiser? The trading strategy? Actually, neither of them. The
true underlying factor sensitivities had not been constant over the estimation period and,
as a result, the OLS1 sensitivity estimates that John used for the risk model were seriously
misleading.
While factor sensitivities are known to vary over time, prevailing methods such as
rolling regressions can be severely biased because they estimate past average factor expo-
sures rather than forecast where these exposures are going to be in the future. In contrast,
the adaptive procedure described in this chapter models and predicts the variations of
factor sensitivities instead of merely smoothing past sensitivities. It can therefore be used
to take advantage of the dynamics driving the relationships between stock and factor
returns. As a result, risk models can be more accurate, investment strategies can be better
immunised against unwanted risk, and risk“return pro¬les can be signi¬cantly improved.
While prevailing methods are often too simple to be correct, powerful adaptive pro-
cedures are often too complicated to be usable. This chapter presents in simple terms the
essentials of dealing with time-varying factor sensitivities. It describes and compares the
various methods of modelling time-varying risk exposures. Particular emphasis is given
to an elegant, rich and powerful estimation method based on the Kalman ¬lter.
The software and spreadsheets supplied on the CD-Rom provide the reader with intu-
itive examples of the various procedures and show how most of the complexity can be
handed over to the computer program. The reader may wish to use the estimation tool
pack provided on their own data in order to estimate time-varying regression coef¬cients.
The software is provided for educational purposes only.


7.1.1 Who should read this chapter and what are its possible applications?

This chapter is aimed at all those using linear regression models with economic and
¬nancial time series. This includes areas such as investment analysis, quantitative trading,
hedging, index tracking, investment performance attribution, style management and risk
measurement. In particular, this chapter is targeted at everyone who has heard of adaptive
models, stochastic parameter models or Kalman ¬ltering but could not or did not want to
invest the time and effort in order to implement such models.
This chapter consists of four main sections. Section 7.2 is a short review of factor mod-
els and factor sensitivities. It introduces the notations used in the rest of the chapter. The
following sections describe three different methods for estimating the factor exposures.
The ¬rst method, presented in Section 7.3, is based on a rolling regression procedure
and uses OLS estimation. The procedure is straightforward and can be implemented in
a simple spreadsheet. It does not require any complex estimation procedure and can
use the widely available linear regression models. The shortcomings of the method are
demonstrated. The second method, presented in Section 7.4, is based on weighted least
squares estimation. The procedure rests on a slightly more complex set of equations but
overcomes a number of weaknesses of the OLS procedure. The third method, presented in
Section 7.5, consists of an adaptive procedure based on the Kalman ¬lter. This stochastic

1
Ordinary least squares estimation is the usual linear regression estimation procedure used on rolling windows
to compute betas and other factor sensitivities.
Time-Varying Factor Sensitivities 215

parameter regression model is shown to be the most accurate and robust procedure of the
three, yielding optimal estimates of factor sensitivities and modelling their time structure.
Finally, Section 7.6 concludes the Chapter.

7.2 FACTOR SENSITIVITIES DEFINED
The estimation and use of factor sensitivities play an important role in equity investment
management. Investment diversi¬cation, portfolio hedging, factor betting2 or immunisa-
tion,3 index tracking, performance attribution, style management all necessitate at some
stage an accurate estimation of factor sensitivities. The factors can be the overall market
(some broad stock market index), some industrial sector, some investment style grouping
(index of growth stocks) or other variables that underlie the correlation structure of stock
returns such as macroeconomic factors (e.g. in¬‚ation, GDP growth), statistical factors
(usually obtained by principal component analysis), or returns of other asset classes (e.g.
crude oil, gold, interest rates, exchange rates).4
The sensitivity of a stock5 to a factor is usually de¬ned as the expected stock return
corresponding to a unit change in the factor. If Y (t) is the return6 of the stock at time
t and X(t) is the simultaneous return (or change) of the factor, then β in the following
equation can be viewed as the factor sensitivity:

Y (t) = ± + βX(t) + µ(t) (7.1)

where ± is a constant representing the average extra-factor performance of the stock and
µ(t) is a random variable with zero mean (by construction), constant variance and zero
covariance with X(t).
If both the stock and the factor returns Y (t) and X(t) can be observed in the market
(and they usually can), then ± and β can be estimated by regression techniques. Once
the β coef¬cient has been estimated, it is possible to immunise the investment against
movements in the factor by selling β amount of a tradable proxy for the factor for every
unit of the investment. For instance, if the factor is the UK stock market, one can sell β
pounds of FTSE 100 index futures for each pound invested. The expected return of the
hedged position would then be ±.
More generally, several factor sensitivities can be estimated simultaneously. In order
to estimate joint sensitivities, equation (7.1) is generalised to more than one factor:
N
Y (t) = ± + βi Xi (t) + µ(t)
(7.2)
i=1

= X(t)β + µ(t)

2
For instance, an investor could construct a portfolio that would be sensitive to one factor only (e.g. default
risk) and immune to all other factors.
3
For instance, an investor could construct a portfolio that would not be sensitive to one factor (e.g. long-term
interest rates risk) but remain sensitive to all other factors.
4
Not all factors may be relevant for all factor models. For example, using macroeconomic factors makes little
sense when modelling daily or hourly returns.
5
Sensitivity estimation is obviously not restricted to stocks but can also apply to other assets, portfolios of
assets or investment strategies.
6
As factor sensitivities may not have a ¬‚at term structure, β is a function of the horizon over which returns are
calculated, i.e. betas based on monthly, weekly, daily and hourly returns may all be different from each other.
216 Applied Quantitative Methods for Trading and Investment

where Y (t) is the return of the investment at time t. Xi (t) are the returns of the i
factors. ± is a constant representing the average extra-factor return of the investment. It
is the sensitivity coef¬cient β0 to the constant factor X0 (t) = 1. βi is a parameter that
represents the joint sensitivity of the investment return to changes in factor i. µ(t) is a
random variable with zero mean (by construction), constant variance (by assumption) and
zero covariance with the factor returns.
The joint sensitivity coef¬cients βi measure “clean” sensitivities, accounting for the
sole effect of one variable Xi while controlling for the other effects. Hence, if the joint
sensitivity of Y to X1 has a value of S, Y is expected to change by S when X1 changes
by 1, if all the other variables X2 , . . . , Xn remain constant. βi is the partial derivative
ˆ ˆ
‚ Y (Xi )/‚Xi of the expectation Y of Y with respect to Xi .


7.3 OLS TO ESTIMATE FACTOR SENSITIVITIES: A SIMPLE,
POPULAR BUT INACCURATE METHOD
Estimating factor sensitivities can be simple or complex, depending on the assumptions
made about the relationships between stock and factor returns. The most popular method
with practitioners and, unfortunately, also the least accurate is standard linear regression,
also known as ordinary least squares (OLS) estimation. It is simple, easy to implement and
widely available in statistical software packages. OLS minimises the mean squared error
(MSE) of the linear model described in equation (7.2). The problem has a simple closed
form solution which consists of a matrix inversion and a few matrix multiplications.
Equation (7.2) can be rewritten using a more concise matrix notation, as:

Y = Xβ + µ (7.3)

where Y is a T — 1 vector of the asset returns, X a T — (N + 1) vector of the factor
returns, β a (N + 1) — 1 vector of factor sensitivities and µ a T — 1 vector of random
variables with zero mean, i.e.
®  ®  ®  ®
Y1 1 X11 ... X1N ± µ1
Y   1 X21 ...  β  µ 
...
Y =  2 , X= , β =  1 , µ= 2 
°...» °... ... ... » °...» °...»
Xtj
YT 1 XT 1 ... XT N βN µT

with expectation E(µ) = 0 and variance“covariance matrix σ 2 (µ) = σ 2 I. Here Yt is the
tth observation of the investment return; Xti is the tth observation of the ith factor return
Xi ; βi is the sensitivity of Y to factor Xi ; µt is the error term at time t.
ˆ
The expression of the OLS estimator β of β can then be shown to be:

β = (X X)’1 X Y
ˆ (7.4)

ˆ
and its estimated variance“covariance matrix is s 2 (β):

ˆ
s 2 (β) = MSE (X X)’1 (7.5)
Time-Varying Factor Sensitivities 217

with MSE being the mean squared error of the regression model:
T N
1 ˆ ˆ
MSE = (Y (t) ’ Y (t)) Y (t) =
2
βi Xi (t) (7.6)
T ’2 t=1 i=0

OLS estimates of the factor sensitivities are unconditional, i.e. they are average sen-
sitivities over the observation sample. Any variation of the betas over the sample is
not modelled and the resulting error is attributed to sampling, i.e. to the µ term in
equation (7.3). Therefore, when using OLS, the modeller implicitly assumes that the
factor sensitivities are constant over the data sample. Hence, in the case of a determin-
istic relationship where beta would change over time in a predictable way, OLS would
estimate an average beta over the sample with a large prediction error µ.
The problem is illustrated in Figure 7.1. The sensitivity ‚Y /‚X of a dependent variable
(say Y ) to an independent one (say X) is represented on the vertical axis. The actual
sensitivity is not constant but varies over time. Linear regression measures an average (or
unconditional) effect over the estimation data set. In the case of Figure 7.1, this average
sensitivity is close to zero and would probably not be statistically signi¬cant. However,
the actual sensitivity is far from small. For instance, at time t, the true factor sensitivity
is strongly negative although the unconditional model™s prediction is slightly positive.
Parameter estimation is clearly biased here. At time t, the expected value of beta is
quite different from the true value. The model wrongly attributes to the stochastic residual
µ an effect that is in fact largely deterministic. The importance of the estimation error
depends on the variance of the actual sensitivity over the sample (the more volatile the
sensitivity, the more likely sensitivity measured at a particular time differs from the
average sensitivity) and the size of the estimation window.
The example given in the “Sensitivity estimation.xls” ¬le illustrates standard linear
regression (OLS) using Microsoft Excel™s native LINEST function.7 The data set consists

True factor
Factor
sensitivity
sensitivity
Sensitivity
‚Y estimated by
‚X linear OLS
Best estimation for
current sensitivity


t Time
bias




Estimation window

Figure 7.1 Actual sensitivity (in black) and unconditional sensitivity measured by linear OLS
estimation (in white). When the true sensitivity is not constant but conditional on time, OLS
estimation, which measures an unconditional (average) sensitivity over time, is biased. The
amplitude of the bias depends on the variance of the true sensitivity and the length of the
estimation window

7
A more user-friendly linear regression tool is also available through the “Analysis Tool pack” add-in that
comes with Excel.
218 Applied Quantitative Methods for Trading and Investment

of 100 observations of two observable variables, Y and X, representing daily percentage
returns. X and Y are related by a simple time-varying relationship that is mostly deter-
ministic (90% of the variance in Y is related to X):

Y (t) = β(t) — X(t) + µ(t) (7.7)

where β(t) is a linear function of time; µ(t) is a Gaussian random variable (µ = 0%, σ =
0.1%). The actual beta (generated in column B and plotted in white in Figure 7.1) increases
linearly over the period, starting at ’0.49, turning positive at time t = 50 and ¬nishing
at 0.50.
Cell K3 contains the Excel formula for the beta estimated over the entire data set, i.e.
“=LINEST($E$3:$E$102,$C$3:$C$102,TRUE,FALSE)”. Its value is ’0.04. If this value
is used to predict Y based on X using the same data set, the R-square is 2%. Cell K4
contains the formula for the beta estimated over the last 50 observations only. Its value
is 0.23 and the corresponding R-square (using these 50 observations) is 61%.8 Given all
the information available at time t = 100 (i.e. 100 observations of X and Y ), which one
is the right value of beta to use at time t = 100? And at time t = 101?
Actually, neither of the two values is satisfactory as they correspond to past averages
of betas rather than current values or predictions of beta. The true value of beta at
time t = 100 is 0.5 and the best value to use when a new value of X and Y becomes
available at t = 101 would be 0.51. Unfortunately OLS is unsuitable when beta changes
over time because it averages out the variations of beta rather than models its time
structure. Furthermore it is backward looking rather than predictive. In other terms, OLS
is insensitive to the order of the observations. One could scramble the data, OLS would
still yield the same estimate of beta. The potentially useful information contained in the
data chronology is not used.
Intuitively, one can see that the estimates are highly sample-dependent. If beta changes
rapidly, estimates using short time periods should be better than estimates using longer
periods. Unfortunately, by reducing the estimation period, one increases sampling error.
The best window size should correspond to an optimal trade-off between sampling error
and beta variation. On the one hand, if the estimation period is too short, there are not
enough observations to separate the information from the noise, resulting in large sampling
variance. On the other hand, if the estimation period is too long, the current beta may
signi¬cantly differ from its average, resulting in a biased model. This is known as the
bias/variance dilemma.9
Based on this perception, many practitioners use rolling regressions. This consists of
applying a linear regression model to a rolling window of observations. The window is an
ordered subsample of the time series. Figure 7.2 illustrates the rolling window approach.
An example of rolling regression is given in column H of the spreadsheet. The corre-
sponding (out-of-sample) R-square is 39.2%.
The provided sensitivity estimation tool pack in the “SEToolPack.XLS” ¬le can also be
used to estimate rolling regressions. In order to do so, select “Sensitivity estimation” from

8
Note that these R-square ¬gures are in-sample and do not re¬‚ect the out-of-sample performance of the model.
At time t = 75 for instance, the 25 last observations (i.e. 76 ’ 100) are not yet available. And yet we use the
t = 100 estimate of beta in order to compute this R-square.
9
This issue becomes even more important and dif¬cult to resolve when measuring joint sensitivities that change
at different speeds. Which window size should then be used?
Time-Varying Factor Sensitivities 219

W1’7 W2’8
Information
already in W1’7

Time
1 2 3 4 5 6 7 8 9
New
Rolling step
information
Variance in W1’7


New: 1/6
Old: 5/6
Expected autocorrelation: sqrt(5/6) = 0.91

Figure 7.2 Rolling regressions. Sensitivities are estimated over a rolling window (here W1’7 )
of the n last observations (here n = 7). Then, the estimation window is moved forward by a
rolling step of p observations (here p =1) and a new set of factor sensitivities is estimated (here
on W2’8 ). The procedure goes on until the last observation in the data set is reached. Unless
p = n, consecutive windows are not independent. Actually, (n ’ p)/n of the variance in a
window is shared with an adjacent window and only p/n of the variance is new information




Figure 7.3 Using the sensitivity estimation tool pack to estimate rolling regressions

the tools menu. In the main menu (Figure 7.3) enter the Y and X ranges into the corre-
sponding edit boxes (dependent and independent variables respectively), select the range
where you want the computed sensitivities to be saved to (output range),10 select “Linear
rolling regression” option and press “OK” to go to the next dialogue box (Figure 7.4). In
the “Rolling regression” dialogue box, select “no weighting” and a window size.

10
The sensitivity estimation tool pack can be used with any worksheet and is not restricted to the example sheet
provided. For instance, it can be used with the “stock beta estimation” sheet or any proprietary data collected
by the reader in other workbooks.
220 Applied Quantitative Methods for Trading and Investment




Figure 7.4 The rolling regression menu of the tool pack

The example provided in the spreadsheet consists of two factors, X1 and X2 and 115
observations. The dependent variable Y is generated by a time-varying linear relationship
of the factors. The sensitivities and the additive stochastic noise are stored in columns AF
to AI of the example sheet. They have been constructed so that each represents a different
type of time series. The sensitivity to X1 is a continuous function of time, the sensitivity
to X2 is constant except for a large level shift occurring half-way through the sample
and alpha is a slowly growing value. These time series are represented by the white lines
in the graph included in the spreadsheet (Figure 7.5). The grey lines correspond to the
rolling regression estimates. The ¬rst n observations, where n is the window size, are
used to estimate the regression coef¬cients at time n. “Ini. Period” will therefore appear
in the ¬rst n rows.
Rolling regressions, although popular, are still unsuitable for time-varying sensitivity
estimation. They give the illusion that they can handle conditional betas while in reality
they are biased estimators of time-varying sensitivities. This is because OLS linear regres-
sion estimates an average sensitivity over each rolling window. If actual sensitivities vary
over time (e.g. follow a trend) they will depart from their average.


Actual Sensitivities Alpha X1 X2
Sensitivity
1.00


0.50

Time
0.00


’0.50

Figure 7.5 Rolling regression example. The data set is provided in the “Example” sheet of
“SEToolPack.xls” ¬le. It consists of 115 observations. The white lines correspond to the actual
sensitivities. The grey lines are the sensitivities estimated by the rolling regression model
provided in the tool pack. The size of the rolling window is 32 observations
Time-Varying Factor Sensitivities 221

Figure 7.2 shows that sensitivities computed from adjacent windows are highly corre-
lated because they are estimated from data sets that share n ’ p observations, n being the
window size and p the rolling shift between two consecutive windows. Autocorrelation in
rolling sensitivities increases with the window size n. This autocorrelation is independent
of the behaviour of the actual sensitivities. Even for random sensitivities, autocorrelation

in estimated sensitivities is expected to be (n ’ p)/p (the correlation squared is the
percentage of variance common to two adjacent windows, i.e. (n ’ p)/p). The measured
effect of a particular event persists as long as it remains in the rolling estimation window.
The consequences of this persistence are threefold:

• “Ghost” effects. If a signi¬cant event occurs on one particular day, it will remain in
the sensitivity series for p further days, where p is the length of the rolling window.
The apparent sensitivity persistence is a measurement artefact and does not necessarily
relate to a true sensitivity shock. The amplitude of the effect depends on both the size
of the rolling window and the variance of the actual sensitivity. Figure 7.6 illustrates
the shadow effect. The estimation bias is a function of the difference between the
two sensitivity levels, their respective durations and the length of the rolling window.
Measured sensitivity remains at this level for another 15 days although actual sensitivity
has returned to its normal level. This is the shadow effect. Rolling regression measures
an effect that may have long disappeared.
• The difference between two consecutive sensitivity estimates is only determined
by two observations. That is the observation entering the rolling window (i.e. the
most recent data point) and the observation leaving the rolling window (i.e. the most


Actual sensitivity
Measured by rolling OLS
Sensitivity




Shadow effect
Estimation bias




1 26 51 76 101 126 151 176
Time

Figure 7.6 Estimation bias and shadow effect in rolling estimation. In this controlled exper-
iment, the actual sensitivity of Y to X is represented by the thin line. It has been chosen constant
over most of the data set at some level (LL) except for a very short period of 5 days during
which it takes an unusually high value (HL). This may be caused, for instance, by incorrect mar-
ket expectations motivated by unfounded rumours. Sensitivity is estimated by OLS regression
over a rolling window. The window is 20 days long. Sensitivity estimates are represented by the
bold line. Rolling estimation clearly underestimates actual sensitivity. At the end of the ¬fth day
of HL, the estimate reaches its maximum value of (5 — HL + 15 — LL)/20
222 Applied Quantitative Methods for Trading and Investment

remote data point). While it is legitimate that the most recent observation affects this
difference, there is no reason why the most remote point should have more in¬‚uence
than any other observation in the window, especially because the size of the window
has been chosen somewhat arbitrarily.
• Sensitivity estimates lag actual sensitivity. Because rolling regression measures an
average sensitivity over the estimation period, estimated betas lag actual betas, especially
when the latter trend. The lag depends on the length of the rolling window. This effect is
clearly visible in Figure 7.5. The beta to variable X2 slowly adapts to the level shift while
the beta to X1 seems to lag the actual beta by about 20 observations.

7.4 WLS TO ESTIMATE FACTOR SENSITIVITIES: A BETTER
BUT STILL SUB-OPTIMAL METHOD
One of the major limitations of rolling OLS is that all observations are given equal
weight, irrespective of their distance in time. Hence, the most recent observation is given
the same “credibility” as the most remote observation in the window. A natural improve-
ment of the rolling window procedure would be to give observations different weights
based on their position in the time series. This procedure is known as weighted least
squares (WLS) estimation and can better deal with variations of sensitivities (see, among
others, Pindyck and Rubinfeld (1998)). Remote observations are attributed less weight
than recent observations.
The criterion to be minimised, WSSR, is the weighted sum of squared residuals rather
than the ordinary sum of squared residuals, SSR. WSSR is de¬ned by:
T
ˆ
WSSR(β) = w(t)(Yβ (t) ’ Y (t))2 = (Y ’ Xβ) W(Y ’ Xβ) (7.8)
t=1

the weights w(t), i.e.
where W is a diagonal matrix containing
® 
w(1) ...
0 0
0 ... 
w(2) . . .
W = 
° ... 0»
... ...
T —T
... 0 w(T )
0

ˆ
The weighted least squares estimator βw is:

ˆ
βw = (X WX)’1 X WY (7.9)

ˆ ˆ
and the estimated variance“covariance matrix of βw is s 2 (βw ):

ˆ
s 2 (βw ) = MSEw (X WX)’1 (7.10)

with MSEw being the weighted squared error of the regression model:
WSSR
MSEw = (7.11)
T
w(t) ’ 2
t=1
Time-Varying Factor Sensitivities 223

Actual Sensitivities Alpha X1 X2
Sensitivity
1.00


0.50

Time
0.00


’0.50

Figure 7.7 Weighted regression example. The size of the rolling window is still 32 observa-
tions. However, here the observations have been weighted using a linear function of time and a
decay rate of 3%


Not surprisingly, when W = I, i.e. when all observations have equal weight, equations (7.8)
to (7.10) can be condensed into the OLS equation. The most popular functions, which are
provided by the tool pack, are linear or exponential but other functions such as sigmoids
can be considered. A decay rate determines the speed at which the weight of an observation
decreases with time.11
In order to use the sensitivity estimation tool pack, select one of the weighting options
in the rolling regression menu (Figure 7.3). For instance, select a linear weighting and
enter a weight decay of 3%. Figure 7.7 shows the result of the estimation.
Weighted least squares estimation induces less autocorrelation in the estimates than
ordinary least squares estimation. Depending on the decay rate, shadow effects, lag and
persistence problems are considerably reduced.
However, WLS does not provide a way to model the sensitivities time series. It still
measures past (weighted) average sensitivities rather than predicting future ones. In addi-
tion, all sensitivity coef¬cients in the regression equation are identically affected by the
weighting, regardless of their rate of change as the weights only depend on the position of
the observation in the time series. Consequently, constant sensitivities suffer from large
weight discount rates while highly variable sensitivities suffer from small decay rates.
There is no single weight discount rate that is adapted to all factor sensitivity coef¬cients
when their variances differ and some trade-off has to be made.


7.5 THE STOCHASTIC PARAMETER REGRESSION MODEL
AND THE KALMAN FILTER: THE BEST WAY TO ESTIMATE
FACTOR SENSITIVITIES
The procedures that have been described so far involve a single regression equation
with constant betas. These procedures use ordinary or weighted least squares in order to
repeatedly estimate new model coef¬cients from adjacent windows of observations.
The stochastic parameter model, however, is based on a conceptually different approach
(see Gouri´ roux et al. (1997), or Harvey (1989)). The beta coef¬cients are not assumed
e

11
Although this rate is usually set a priori, one could also estimate it, for instance by minimising the WSSR.
224 Applied Quantitative Methods for Trading and Investment

constant and are optimally adjusted when new information becomes available. The dynam-
ics of the betas are modelled in a second equation. Very much like a GARCH model,12
the stochastic parameter regression model is based on a system of equations. Hence, the
linear regression equation (7.2) can be rewritten into a simple time-varying regression by
letting β follow a given time process, for example an autoregressive process:13

Yt = Xt βt + µt (7.12)

and
βt = β t’1 + ·t for t = 1, . . . , T (7.13)

where is a non-random K — K matrix, K being the number of factors; ·t is a vector
of serially uncorrelated disturbances with zero mean. It is not observable.
The addition of the second equation (7.13) allows beta to vary in time according to
and ·t are not observable and need to
a process that can be modelled. However,
be estimated. Simple estimation procedures such as OLS which can only handle single
equations cannot be used here but fortunately, the problem can be resolved as it can easily
be put into a state space form. The Kalman ¬lter can then be used to update the parameters
of the model. Moreover, the same Kalman ¬lter can be employed with a large variety of
time processes for the sensitivities without adding much computational complexity.14
Originally developed by control engineers in the 1960s (see Kalman (1960)) for appli-
cation concerning spacecraft navigation and rocket tracking, state space models have since
attracted considerable attention in economics, ¬nance, and the social sciences and have
been found to be useful in many non-stationary problems and particularly for time series
analysis. A state space model consists of a system of equations aimed at determining the
state of a dynamic system from observed variables (usually time series) contaminated
by noise. Time, or the ordering of observations, plays an important role in such models.
Two identical state space models applied to the same data are likely to produce different
estimates if observations are presented to the models in different orders (i.e. scrambling
observations alters model estimates). It is therefore not surprising that they are usually
used for modelling time series. The state of the dynamic system, described by “state
variables”, is assumed to be linearly related to the observed input variables.
The stochastic coef¬cient regression model expressed in a state space form can be
de¬ned by a system of two equations. The ¬rst equation (e.g. equation (7.12)) is referred to
as the observation equation and relates the dependent variable, Yt , to the independent vari-
ables by the unobservable states (in the case of equation (7.12), the state is simply the vec-
tor of sensitivities βt ). More generally, the observation equation takes the following form:

Yt = Ht (st + d) + µt for t = 1, . . . , T (7.14)

12
See for instance Bollerslev (1986).
13
Equation 7.13 describes one example of a time process that can be used for modelling the sensitivities time
series. This particular process corresponds to a simple AR1. The Kalman ¬lter approach can be used with many
more processes.
If ·t has zero variance and = I, then βt is constant and equation (7.12) is just the familiar equation of a
14

linear regression. The estimates of βt are those of a recursive regression (OLS linear regression using all the
observations from i = 1 to t) and the estimate of βT is identical to the OLS estimate of a linear regression. If
the variance of ·t is larger than zero, however, sensitivity coef¬cients are allowed to change over time.
Time-Varying Factor Sensitivities 225

where Yt is the dependent observable variable (e.g. an asset return) at time t. st is a K
vector that describes the state of the factor sensitivities of the asset returns at time t. d is
a K non-random vector that can account for the long-term mean in sensitivities. Ht is a
K vector that contains the factor values at time t. µt is a serially uncorrelated perturbation
with zero mean and variance Rt , i.e.: E(µt ) = 0 and Var(µt ) = Rt , Rt being non-random.
It is not observable.
The exact representation of Ht and st will depend on model speci¬cation. For the
autoregressive model described by equations (7.12) and (7.13), which is also the most
popular, Ht and st simply correspond to Xt and βt respectively. However, this is not
the case for more complex models such as the random trend model which requires more
state variables (K = 2N , N being the number of factors). In general, the elements of st
are not observable. However, their time structure is assumed to be known. The evolution
through time of these unobservable states is a ¬rst-order Markov process described by a
transition equation:

st = s t’1 + ct + ·t for t = 1, . . . , T (7.15)

where is a non-random K — K state transition matrix; ct is a non-random K vector; ·t
is an m — 1 vector of serially uncorrelated disturbances with zero mean and covariance
Qt ; E(·t ) = 0 and Var(·t ) = Qt , Qt being non-random. It is not observable.
The initial state vector s0 has a mean of s and a covariance matrix P0 , i.e.:

E(s0 ) = s Var(s0 ) = P0
and

Furthermore, the disturbances µt and ·t are uncorrelated with each other in all time
periods, and uncorrelated with the initial state, i.e.:

E(µt , ·t ) = 0 for all elements ·t of ·t , and for t = 1, . . . , T

and
E(µt , s) = 0 and E(·t , s) = 0 for t = 1, . . . , T

The system matrices , P and Q, the vectors ·, c and d and the scalar R are non-random
although some of them may vary over time (but they do so in a predetermined way).
Furthermore, the observation noise µt and the system noise ·t are Gaussian white noises.
The most popular processes used for time-varying sensitivities are the random walk
model and the random trend model.
The random walk model dictates that the best estimate of future sensitivities is the
current sensitivity. To specify this model, vectors d and c are set to zero and the system
matrices for the random walk model are given by equations (7.16):15

st = [βt,1 , βt,2 , . . . , βt,N ]
Ht = [1, Ft,1 , . . . , Ft,N’1 ] (7.16)
=I

15
Hence the random walk speci¬cation of the stochastic regression model can be expressed as a system of two
equations: (1) Yt = Xt βt + µt and (2) βt = βt’1 + ·t .
226 Applied Quantitative Methods for Trading and Investment

is N — N , Ht is 1 — N and st is N — 1, N being the number of independent
where
variables, including the constant.
The random trend model dictates that the best estimate of future sensitivities is the
current sensitivity plus the trend. In the presence of a trend, sensitivities at t + 1 are not
equally likely to be above or under the value at t. A simple model which allows the
sensitivities to trend is the random trend model. A sensitivity coef¬cient βit follows a
random trend if:
βt = βt’1 + δt’1 + ·t,1
(7.17)
δt = δt’1 + ·t,2

where βt represents the sensitivity vector at time t and δt is the random trend in the
sensitivity vector at time t.16
In a state space form, the system in (7.17) can be written as:

βt βt’1 ·t,1
= + (7.18)
δt δt’1 ·t,2
0


where the individual sensitivity random trend state transition matrix is given by
11
0= .
01
To specify this model, vectors d and c are set to zero and the collection of factor
sensitivities and trends are expressed as:

s(t) = [β1 (t), δ1 (t), β2 (t), δ2 (t), . . . , βN (t), δN (t)] (7.19)
H(t) = [1, 0, 1, 0, . . . , 1, 0] (7.20)
® 
0 ··· 0
0
0 0
 
0
= .  (7.21)
..
°. »
.
.
0 0 0


where the dimensions of the state of sensitivities has been doubled to include trends in
the sensitivities, i.e. is 2N — 2N , st is 2N — 1 and Ht is 1 — 2N .
Other time processes such as the random coef¬cient model (Schaefer et al., 1975) or
the mean-reverting coef¬cient model (Rosenberg, 1973) can also be used within this
modelling framework.
The objective of state space modelling is to estimate the unobservable states of the
dynamic system in the presence of noise. The Kalman ¬lter is a recursive method of
doing this, i.e. ¬ltering out the observation noise in order to optimally estimate the state
vector at time t, based on the information available at time t (i.e. observations up to
and including Yt ). What makes the operation dif¬cult is the fact that the states (e.g. the
sensitivities) are not constant but change over time. The assumed amount of observation
noise versus system noise is used by the ¬lter to optimally determine how much of the
variation in Yt should be attributed to the system, and how much is caused by observation

16
Hence the random trend speci¬cation of the stochastic regression model can be expressed as a system of
three equations: (1) Yt = Xt βt + µt , (2) βt = βt’1 + δt’1 + ·t,1 and (3) δt = δt’1 + ·t,2 .
Time-Varying Factor Sensitivities 227

noise. The ¬lter consists of a system of equations which allows us to update the estimate
of the state st when new observations become available.
Equations (7.22) to (7.31) describe the Kalman ¬lter.17 These equations are important
to the more technical readers who want to develop and implement the model. Other
readers may want to skip these equations and directly move to the more intuitive example
given later.
ˆ
The ¬rst two equations de¬ne the state st|t (equation (7.22)) and the state error covari-
ance matrix Pt|t (equation (7.23)):
ˆ
st|t = E(st |Y1 , . . . , Yt ) (7.22)
ˆ ˆ ˆ
Pt|t = E((st ’ st|t )(st ’ st|t ) ) = Var(st ’ st|t ) (7.23)
ˆ
where the notation ab|c denotes an estimate of a at time b conditional on the information
available at time c.
ˆ
The predicted state of sensitivities st|t’1 and the corresponding forecasting error
ˆ ˆ
covariance matrix Pt|t’1 = E((st ’ st|t’1 )(st ’ st|t’1 ) ) are given by the prediction
equations (7.24) and (7.25):
st|t’1 = st’1|t’1 + ct
ˆ ˆ (7.24)
Pt|t’1 = + Qt
Pt’1|t’1 (7.25)
and the predicted dependent variable is given by equation (7.26):
ˆ
Yt|t’1 = Ht st|t’1 + d (7.26)
ˆ
The forecast error et at time t and its variance ft can be calculated by equations (7.27)
and (7.28) respectively:
ˆ
et = Yt ’ Yt|t’1 (7.27)
ft|t’1 = + Rt
Pt|t’1 (7.28)
The new information is represented by the prediction error, et . It can be the result of
several factors: random ¬‚uctuations in returns, changes in underlying states, or error in
ˆ
previous state estimates. Given this new information, the estimate st|t of the state vector
and its covariance Pt|t can now be updated through the updating equations (7.29) and
(7.30) respectively. The Kalman ¬lter uses the new information for adjusting the estimates
of the underlying states, where the new information is simply the prediction error of the
returns. The Kalman gain matrix, Kt , optimally adjusts the state estimates in order to
re¬‚ect the new information:
ˆ ˆ
st|t = st|t’1 + Kt et (7.29)
Pt|t = (I ’ Kt Ht )Pt|t’1 (7.30)
The Kalman gain matrix is calculated from the observation noise variance Rt and the
predicted state error covariance Pt|t’1 with the recursive equation (7.31):

Pt|t’1 Ht (Ht Pt|t’1 Ht + Rt )’1
Kt = (7.31)

17
Further explanations can be found in Harvey (1989) or Gouri´ roux et al. (1997).
e
228 Applied Quantitative Methods for Trading and Investment

Non-random values of:
Initialisation: t = 1
¦, d, ct , Rt , Qt
s1|0 = s, P1\0 = P0



Yt|t ’ 1 = Ht st|t ’ 1 + d
Predict Y:

et = Yt ’ Yt|t ’ 1
Compute error:
Kt = Pt|t ’ 1H′ (Ht Pt|t ’ 1H′ + Rt )’1
Compute gain: t t

t=t+1
st|t = st|t ’ 1 + Kt et
Updating
Observation
of Yt and Ht
Pt|t = (I ’ Kt H′ )Pt|t ’ 1
equations: t




st|t ’ 1 = ¦st ’ 1|t ’ 1 + ct
Prediction
Pt|t ’ 1 = ¦Pt ’ 1|t ’ 1¦′ + Qt
equations:


No
t=T?
Yes

Stop

Figure 7.8 Flowchart of the Kalman ¬lter

In order to clarify the sequence of the Kalman ¬lter equations, Figure 7.8 presents a
¬‚owchart for the ¬lter.
The recursive nature of the Kalman ¬lter is a major computational advantage. It enables
the model to update the conditional mean and covariance estimates of the states at time
t based on the sole estimate obtained at time t ’ 1. Although it takes into account the
entire history, it does not need an expanding memory.
For an intuitive understanding of the principles involved, let us consider the example
of the time-varying sensitivity (which follows a random walk) of a dependent variable Yt
to a single independent variable Xt . The familiar regression equation is:

Yt = bt Xt + µt for t = 1, . . . , T (7.32)

which can be represented in a state space form, using the random walk speci¬cation:

Yt = bt Xt + µt
for t = 1, . . . , T (7.33)
bt = bt’1 + ·t

with Var(µt ) = R and Var(·t ) = Q. We can recognise the observation and transition
equations (7.14) and (7.15) with st = bt , Ht = Xt , = 1, d = 0, ct = 0.
Upon presentation of a new observation Xt of X, the model expects to observe a value
bt|t’1 Xt of Y based on its predicted state18 bt|t’1 . However, the observed value Yt of the

18
As the sensitivity coef¬cient b is assumed to follow a random walk the predicted state is equal to the last
estimate bt’1|t’1 of the state resulting from all available information at time t ’ 1.
Time-Varying Factor Sensitivities 229

dependent variable Y is likely to be different from the model™s expectation. This forecast
error may result from two sources:

• Some unpredictable force temporarily affects the dependent variable. This random noise
should not affect the state.
• The true value of the sensitivity of Y to X has changed (structural change). This is not
a temporary disturbance and affects the state.

The purpose of the Kalman ¬lter is to attribute some share of the prediction error to each
of these two sources, i.e. separate the signal from the noise. The relative magnitude of the
system noise variance Q and the observation noise variance R is therefore an important
parameter of the model. The following equations correspond to the various stages of the
Kalman ¬lter for the random walk speci¬cation.19
The prediction equations:
ˆ ˆ
bt|t’1 = bt’1|t’1
(7.34)
Pt|t’1 = Pt’1|t’1 + Q

The updating equations:

ˆ ˆ ˆ
bt|t = bt|t’1 + kt (Yt ’ bt|t’1 Xt )
Pt|t = (1 ’ kt )Pt|t’1
(7.35)
Pt|t’1 Xt2
kt =
Pt|t’1 Xt2 + R

ˆ
The sensitivity estimate bt|t is updated by taking into account the forecast error. The
fraction of forecast error that is added to the previous estimate of b is the Kalman
gain kt . Its value is in the interval [0,1], with zero corresponding to Q = 0 and one
corresponding to R = 0. The Kalman gain depends on the relative value of observation
and system noises and on the estimated variance of the state. For small values of kt (large
observation noise R compared to system noise Q and/or small uncertainty about the state
estimate), considerable credibility is given to the previous sensitivity and as a result to
remote observations. The sensitivity coef¬cient bt evolves smoothly over time.
In contrast, if kt takes larger values (i.e. large system noise Q compared to observation
noise R and/or large uncertainty about the state estimate), then more credibility is given
to recent observations and therefore less weight is given to recent sensitivity estimates.
In this case, the sensitivity coef¬cient bt evolves quickly over time and its volatility
increases.
In order to understand well the effect of the ratio between system and observation noise
variance, let us consider an example where the sensitivity coef¬cient, beta, is constant
over the data set except for a short time period (5 days) during which it takes an unusually
high value. This pattern may be created, for example, by incorrect market expectations
motivated by unfounded rumours. A Kalman ¬lter is used to estimate the time-varying
sensitivity. Three different signal to noise ratios (Q/R) are applied. For a ratio of 1

ˆ
19
The notation ab|c denotes an estimate of a at time b conditional on the information available at time c.
230 Applied Quantitative Methods for Trading and Investment

(Figure 7.9), the model rapidly adapts to the jump in sensitivity. However, this quick
reaction comes at a cost of increased volatility in the beta estimates and of a large
standard error. If the signal to noise ratio is decreased to a value of 0.1 (Figure 7.10), the
beta estimates become smoother and the con¬dence bands narrower. The model seems to
be more reliable, although it does not adapt quickly enough to the shift in beta occurring
at time t = 90. If Q/R is set to a small value, such as 0.02 (Figure 7.11), beta estimates
become very smooth and the con¬dence bands become very small. However, with such a
small ratio, the model does not manage to properly track the sensitivity when it jumps to
a high level. It is also interesting to see how the standard error, i.e. Pt|t , decreases after
a few observations. This is because the initial value of Pt|t’1 (i.e. P1|0 ) was deliberately
chosen to be large so that the system could rapidly adjust the value of beta in order to
match the observations.
This model is available in the sensitivity estimation tool pack. Select “Stochastic par-
ameter regression” in the main menu to enter the submenu displayed in Figure 7.12. Select
“Random walk”. Make sure that the “Estimate parameters” check box is unchecked.
The stochastic parameter regression menu proposes both a random walk model and a
random trend model for the underlying sensitivities. The system and observation noise
variance can be set by the user (option B), in which case a reference to the cells containing

<<

. 11
( 19)



>>