. 8
( 14)


σ(S(i’1)/N —— ) (S(i’1)/N —— ) (∆W(i’1)/N —— )2 ’ ∆t ,
2 ‚S
where we set the drift equal to r which is extracted from MD*BASE and
corresponds to the time to maturity used in the simulation and N —— is the
204 9 Trading on Deviations of Implied and Historical Densities

number of days to maturity. The ¬rst derivative of σ(. ) is approximated by:
σ(S(i’1)/N —— ) ’ σ(S(i’1)/N —— ’ ∆S)
(S(i’1)/N —— ) = ,
‚S ∆S
where ∆S is 1/2 of the width of the bingrid on which the di¬usion function is
estimated. Finally the estimated di¬usion function is linearly extrapolated at
both ends of the bingrid to accommodate potential outliers.
With these ingredients we start the simulation with index value S0 = 3328.41
(Monday, April 21, 1997) and time to maturity „ = 88/360 and r = 3.23. The
expiration date is Friday, July 18, 1997. From these simulated index values
we calculate annualized log“returns which we take as input of the nonpara-
metric density estimation (see equation (9.5)). The XploRe quantlet denxest
accomplishes the estimation of the time series density by means of the Gaussian
kernel function:
1 1
√ exp ’ u2 .
K(u) =

The bandwidth hM C is computed by the XploRe quantlet denrot which applies
Silverman™s rule of thumb.
First of all, we calculate the optimum bandwidth hM C given the vector of
10, 000 simulated index values. Then we search the bandwidth h™ C which
implies a variance of g — to be closest to the variance of f — (but to be still
within 0.5 to 5 times hM C ). We stop the search if var(g — ) is within a range
of 5% of var(f — ). Following, we translate g — such that its mean matches the
futures price F. Finally, we transform this density over DAX index values ST
into a density g — ™ over log“returns uT . Since

ST x
P(ST < x) = P ln < ln = P(uT < u)
St St

where x = St eu , we have

P(ST ∈ [x, x + ∆x]) = P(uT ∈ [u, u + ∆u])


P(ST ∈ [x, x + ∆x]) ≈ g — (x)∆x
P(uT ∈ [u, u + ∆u]) ≈ g — ™(u)∆u.
9.4 Comparison of Implied and Historical SPD 205

Therefore, we have as well (see H¨rdle and Simar (2002) for density transfor-
mation techniques)

g — (St eu )∆(St eu )
g — ™(u) g — (St eu )St eu .
≈ ≈

To simplify notations, we will denote both densities g — . Figure 9.2 displays the
resulting time series density over log“returns on Friday, April 18, 1997. Pro-
ceeding in the same way for all 30 periods beginning in April 1997 and ending
in September 1999, we obtain the time series of the 3 month ˜forward™ skewness
and kurtosis values of g — shown in Figures 9.3 and 9.4. The ¬gures reveal that
the time series distribution is systematically slightly negatively skewed. Skew-
ness is very close to zero. As far as kurtosis is concerned we can extract from
Figure 9.4 that it is systematically smaller than but nevertheless very close to
3. Additionally, all time series density plots looked like the one shown in Figure

9.4 Comparison of Implied and Historical SPD
At this point it is time to compare implied and historical SPDs. Since by
construction, expectation and variance are adjusted, we focus the comparison
on skewness and kurtosis. Starting with skewness, we can extract from Figure
9.3 that except for one period the IBT implied SPD is systematically more
negatively skewed than the time series SPD, a fact that is quite similar to what
Ait“Sahalia, Wang and Yared (2000) already found for the S&P 500. The 3
month IBT implied SPD for Friday, September 17, 1999 is slightly positively
skewed. It may be due to the fact that in the months preceeding June 1999,
the month in which the 3 month implied SPD was estimated, the DAX index
stayed within a quite narrow horizontal range of index values after a substantial
downturn in the 3rd quarter of 1998 (see Figure 9.11) and agents therefore
possibly believed index prices lower than the average would be more realistic
to appear. However, this is the only case where skew(f — )>skew(g — ).
206 9 Trading on Deviations of Implied and Historical Densities

Skewness Comparison: TS=thin; IBT=thick
07/18/97 04/17/98 03/19/99 12/17/99





Figure 9.3. Comparison of Skewness time series for 30 periods.

Kurtosis Comparison: TS=thin; IBT=thick




0.0 Time
07/18/97 04/17/98 03/19/99 12/17/99

Figure 9.4. Comparison of Kurtosis time series for 30 periods.
9.5 Skewness Trades 207

The kurtosis time series reveals a similar pattern as the skewness time series.
The IBT SPD has except for one period systematically more kurtosis than the
time series SPD. Again this feature is in line with what Ait“Sahalia, Wang
and Yared (2000) found for the S&P 500. The 3 month IBT implied SPD for
Friday, October 16, 1998 has a slightly smaller kurtosis than the time series
SPD. That is, investors assigned less probability mass to high and low index
prices. Note that the implied SPD was estimated in July 1998 after a period
of 8 months of booming asset prices (see Figure 9.11). It is comprehensible
in such an environment that high index prices seemed less realistic to appear.
Since the appearance of low index prices seemed to be unrealistic as well, agents
obviously expected the DAX move rather sideways.

9.5 Skewness Trades
In the previous section we learned that the implied and the time series SPD™s
reveal di¬erences in skewness and kurtosis. In the following two sections, we
investigate how to pro¬t from this knowledge. In general, we are interested in
what option to buy or to sell at the day at which both densities were estimated.
We consider exclusively European call or put options.
According to Ait“Sahalia, Wang and Yared (2000), all strategies are designed
such that we do not change the resulting portfolio until maturity, i.e. we keep
all options until they expire. We use the following terms for moneyness which
we de¬ne as K/(St e(T ’t)r ):

Moneyness(FOTM Put) < 0.90
0.90 Moneyness(NOTM Put) < 0.95
0.95 Moneyness(ATM Put) < 1.00
1.00 Moneyness(ATM Call) < 1.05
1.05 Moneyness(NOTM Call) < 1.10
1.10 Moneyness(FOTM Call)

Table 9.1. De¬nitions of moneyness regions.

where FOTM, NOTM, ATM stand for far out“of“the“money, near out“of“the“
money and at“the“money respectively.
A skewness trading strategy is supposed to exploit di¬erences in skewness of
two distributions by buying options in the range of strike prices where they
208 9 Trading on Deviations of Implied and Historical Densities

are underpriced and selling options in the range of strike prices where they
are overpriced. More speci¬cally, if the implied SPD f — is less skewed (for
example more negatively skewed) than the time series SPD g — , i.e. skew(f — ) <
skew(g — ), we sell the whole range of strikes of OTM puts and buy the whole
range of strikes of OTM calls (S1 trade). Conversely, if the implied SPD is
more skewed, i.e. skew(f — ) > skew(g — ), we initiate the S2 trade by buying the
whole range of strikes of OTM puts and selling the whole range of strikes of
OTM calls. In both cases we keep the options until expiration.
Skewness s is a measure of asymmetry of a probability distribution. While for a
distribution symmetric around its mean s = 0, for an asymmetric distribution
s > 0 indicates more weight to the left of the mean. Recalling from option
pricing theory the pricing equation for a European call option, Franke, H¨rdle
and Hafner (2001):

’r(T ’t)
max(ST ’ K, 0)f — (ST )dST , (9.6)
C(St , K, r, T ’ t) =e

where f — is the implied SPD, we see that when the two SPD™s are such that
skew(f — ) < skew(g — ), agents apparently assign a lower probability to high
outcomes of the underlying than would be justi¬ed by the time series density,
see Figure 7.13. Since for call options only the right ˜tail™ of the support
determines the theoretical price, the latter is smaller than the price implied by
equation (9.6) using the time series density. That is, we buy underpriced calls.
The same reasoning applies to European put options. Looking at the pricing
equation for such an option:

’r(T ’t)
max(K ’ ST , 0)f — (ST )dST , (9.7)
P (St , K, r, T ’ t) =e

we conclude that prices implied by this pricing equation using f — are higher
than the prices using the time series density. That is, we sell puts.
Since we hold all options until expiration and due to the fact that options for
all strikes are not always available in markets we are going to investigate the
payo¬ pro¬le at expiration of this strategy for two compositions of the portfolio.
To get an idea about the exposure at maturity let us begin with a simpli¬ed
portfolio consisting of one short position in a put option with moneyness of
0.95 and one long position in a call option with moneyness of 1.05. To further
simplify, we assume that the future price F is equal to 100 EUR. Thus, the
portfolio has a payo¬ which is increasing in ST , the price of the underlying at
maturity. For ST < 95 EUR the payo¬ is negative and for ST > 105 EUR it is
9.5 Skewness Trades 209

However, in the application we encounter portfolios containing several
long/short calls/puts with increasing/decreasing strikes as indicated in Table

Payoff of S1 Trade : OTM

85 90 95 100 105 110 115
Figure 9.5. S1 trade payo¬ at maturity of portfolio detailed in Table

Figure 9.5 shows the payo¬ of a portfolio of 10 short puts with strikes ranging
from 86 EUR to 95 EUR and of 10 long calls striking at 105 EUR to 114 EUR,
the future price is still assumed to be 100 EUR. The payo¬ is still increasing
in ST but it is concave in the left tail and convex in the right tail. This is due
to the fact that our portfolio contains, for example, at ST = 106 EUR two call
options which are in the money instead of only one compared to the portfolio
considered above. These options generate a payo¬ which is twice as much. At
ST = 107 EUR the payo¬ is in¬‚uenced by three ITM calls procuring a payo¬
which is three times higher as in the situation before etc. In a similar way we
can explain the slower increase in the left tail. Just to sum up, we can state
that this trading rule has a favorable payo¬ pro¬le in a bull market where the
underlying is increasing. But in bear markets it possibly generates negative
cash ¬‚ows. Buying (selling) two or more calls (puts) at the same strike would
change the payo¬ pro¬le in a similar way leading to a faster increase (slower
decrease) with every call (put) bought (sold).
The S2 strategy payo¬ behaves in the opposite way. The same reasoning can
be applied to explain its payo¬ pro¬le. In contradiction to the S1 trade the S2
trade is favorable in a falling market.
210 9 Trading on Deviations of Implied and Historical Densities

Moneyness Moneyness

0.86 ’ 0.95
short put 0.95
1.05 ’ 1.14
long call 1.05

Table 9.2. Portfolios of skewness trades.

9.5.1 Performance

Given the skewness values for the implied SPD and the time series SPD we
now have a look on the performance of the skewness trades. Performance is
measured in net EUR cash ¬‚ows which is the sum of the cash ¬‚ows generated
at initiation in t = 0 and at expiration in t = T . We ignore any interest rate
between these two dates. Using EUREX settlement prices of 3 month DAX put
and calls we initiated the S1 strategy at the Monday immediately following the
3rd Friday of each month, beginning in April 1997 and ending in September
1999. January, February, March 1997 drop out due to the time series density
estimation for the 3rd Friday of April 1997. October, November and December
1999 drop out since we look 3 months forward. The cash ¬‚ow at initiation stems
from the in¬‚ow generated by the written options and the out¬‚ow generated by
the bought options and hypothetical 5% transaction costs on prices of bought
and sold options. Since all options are kept in the portfolio until maturity (time
to expiration is approximately 3 months, more precisely „ = TTM/360) the
cash ¬‚ow in t = T is composed of the sum of the inner values of the options in
the portfolio.
Figure 9.6 shows the EUR cash ¬‚ows at initiation, at expiration and the re-
sulting net cash ¬‚ow for each portfolio. The sum of all cash ¬‚ows, the total net
cash ¬‚ow, is strongly positive (9855.50 EUR). Note that the net cash ¬‚ow (blue
bar) is always positive except for the portfolios initiated in June 1998 and in
September 1998 where we incur heavy losses compared to the gains in the other
periods. In other words, this strategy would have procured 28 times moder-
ate gains and two times large negative cash ¬‚ows. As Figure 9.5 suggests this
strategy is exposed to a directional risk, a feature that appears in December
1997 and June 1998 where large payo¬s at expiration (positive and negative)
occur. Indeed, the period of November and December 1997 was a turning point
of the DAX and the beginning of an 8 month bull market, explaining the large
payo¬ in March 1998 of the portfolio initiated in December 1997. The same
9.5 Skewness Trades 211

Performance S1
CashFlow in EUR



0 Time



07/97 10/97 01/98 04/98 07/98 10/98 01/99 04/99 07/99 10/99

Figure 9.6. Performance of S1 trade with 5% transaction costs. The
¬rst (red), second (magenta) and the third bar (blue) show for each
period the cash ¬‚ow in t = 0, in t = T and the net cash ¬‚ow respectively.
Cash ¬‚ows are measured in EUR. XFGSpdTradeSkew.xpl

arguing explains the large negative payo¬ of the portfolio set up in June 1998
expiring in September 1998 (refer to Figure 9.11). Another point to note is
that there is a zero cash ¬‚ow at expiration in 24 periods. Periods with a zero
cash ¬‚ow at initiation and at expiration are due to the fact that there was not
set up any portfolio (there was no OTM option in the database).
Since there is only one period (June 1999), when the implied SPD is more
skewed than the time series SPD a comparison of the S1 trade with knowledge
of the latter SPD™s and without this knowledge is not useful. A comparison
of the skewness measures would have ¬ltered out exactly one positive net cash
¬‚ow, more precisely the cash ¬‚ow generated by a portfolio set up in June
1999. But to what extend this may be signi¬cant is uncertain. For the same
reason the S2 trade has no great informational content. Applied to real data
it would have procured a negative total net cash ¬‚ow. Actually, only in June
1999 a portfolio would have been set up. While the S1 trade performance was
independent of the knowledge of the implied and the time series SPD™s the
S2 trade performance changed signi¬cantly as it was applied in each period
212 9 Trading on Deviations of Implied and Historical Densities

(without knowing both SPD™s). The cash ¬‚ow pro¬le seemed to be the inverse
of Figure 9.6 indicating that should there be an options mispricing it would
probably be in the sense that the implied SPD is more negatively skewed than
the time series SPD.

9.6 Kurtosis Trades
A kurtosis trading strategy is supposed to exploit di¬erences in kurtosis of two
distributions by buying options in the range of strike prices where they are
underpriced and selling options in the range of strike prices where they are
overpriced. More speci¬cally, if the implied SPD f — has more kurtosis than
the time series SPD g — , i.e. kurt(f — ) > kurt(g — ), we sell the whole range of
strikes of FOTM puts, buy the whole range of strikes of NOTM puts, sell the
whole range of strikes of ATM puts and calls, buy the whole range of strikes
of NOTM calls and sell the whole range of strikes of FOTM calls (K1 trade).
Conversely, if the implied SPD has less kurtosis than the time series density g — ,
i.e. kurt(f — ) < kurt(g — ), we initiate the K2 trade by buying the whole range of
strikes of FOTM puts, selling the whole range of strikes of NOTM puts, buying
the whole range of strikes of ATM puts and calls, selling the whole range of
strikes of NOTM calls and buying the whole range of strikes of FOTM calls.
In both cases we keep the options until expiration.
Kurtosis κ measures the fatness of the tails of a distribution. For a normal
distribution we have κ = 3. A distribution with κ > 3 is said to be leptokurtic
and has fatter tails than the normal distribution. In general, the bigger κ is,
the fatter the tails are. Again we consider the option pricing formulae (9.6)
and (9.7) and reason as above using the probability mass to determine the
moneyness regions where we buy or sell options. Look at Figure 7.14 for a
situation in which the implied density has more kurtosis than the time series
density triggering a K1 trade.
To form an idea of the K1 strategy™s exposure at maturity we start once again
with a simpli¬ed portfolio containing two short puts with moneyness 0.90 and
1.00, one long put with moneyness 1.00, two short calls with moneyness 1.00
and 1.10 and one long call with moneyness 1.05. Figure 9.7 reveals that this
portfolio inevitably leads to a negative payo¬ at maturity regardless the move-
ment of the underlying.
Should we be able to buy the whole range of strikes as the K1 trading rule
suggests, the portfolio is given in Table 9.3, FOTM“NOTM“ATM“K1, we get
9.6 Kurtosis Trades 213

Payoff of K1 Trade

85 90 95 100 105 110 115
Figure 9.7. Kurtosis trade 1 payo¬ at maturity of portfolio detailed in
Table 9.3.

a payo¬ pro¬le (Figure 9.8) which is quite similar to the one from Figure 9.7.
In fact, the payo¬ function looks like the ˜smooth™ version of Figure 9.7.

Payoff of K1 Trade : FOTM-NOTM-ATM

85 90 95 100 105 110 115
Figure 9.8. K1 trade payo¬ at maturity of portfolio detailed in Table

Changing the number of long puts and calls in the NOTM regions can produce
a positive payo¬. Setting up the portfolio given in Table 9.3, NOTM“K1,
results in a payo¬ function shown in Figure 9.9. It is quite intuitive that the
more long positions the portfolio contains the more positive the payo¬ will be.
Conversely, if we added to that portfolio FOTM short puts and calls the payo¬
would decrease in the FOTM regions.
As a conclusion we can state that the payo¬ function can have quite di¬erent
shapes depending heavily on the speci¬c options in the portfolio. If it is possible
to implement the K1 trading rule as proposed the payo¬ is negative. But it may
214 9 Trading on Deviations of Implied and Historical Densities

Payoff of K1 Trade : NOTM


85 90 95 100 105 110 115
Figure 9.9. K1 trade payo¬ at maturity of portfolio detailed in Table

happen that the payo¬ function is positive in case that more NOTM options
(long positions) are available than FOTM or ATM (short positions) options.

Moneyness Moneyness Moneyness

0.86 ’ 0.90
short put 0.90 0.90
0.91 ’ 0.95 0.91 ’ 0.95
long put 0.95
0.96 ’ 1.00
short put 1.00 1.00
1.00 ’ 1.04
short call 1.00 1.00
1.05 ’ 1.09 1.05 ’ 1.09
long call 1.05
1.10 ’ 1.14
short call 1.10 1.10

Table 9.3. Portfolios of kurtosis trades.

9.6.1 Performance

To investigate the performance of the kurtosis trades, K1 and K2, we proceed in
the same way as for the skewness trade. The total net EUR cash ¬‚ow of the K1
trade, applied when kurt(f — ) > kurt(g — ), is strongly positive (10, 915.77 EUR).
As the payo¬ pro¬les from ¬gures 9.7 and 9.8 already suggested, all portfolios
generate negative cash ¬‚ows at expiration (see magenta bar in Figure 9.10). In
contrast to that, the cash ¬‚ow at initiation in t = 0 is always positive. Given
the positive total net cash ¬‚ow, we can state that the K1 trade earns its pro¬t in
t = 0. Looking at the DAX evolution shown in Figure 9.11, we understand why
9.6 Kurtosis Trades 215

Performance K1
CashFlow in EUR


0 Time


07/97 10/97 01/98 04/98 07/98 10/98 01/99 04/99 07/99 10/99

Figure 9.10. Performance of K1 trade with 5% transaction costs. The
¬rst (red), second (magenta) and the third bar (blue) show for each
period the cash ¬‚ow in t = 0, in t = T and the net cash ¬‚ow respectively.
Cash ¬‚ows are measured in EUR. XFGSpdTradeKurt.xpl

the payo¬ of the portfolios set up in the months of April 1997, May 1997 and in
the months from November 1997 to June 1998 is relatively more negative than
for the portfolios of June 1997 to October 1997 and November 1998 to June
1999. The reason is that the DAX is moving up or down for the former months
and stays within an almost horizontal range of quotes for the latter months
(see the payo¬ pro¬le depicted in Figure 9.8). In July 1998 no portfolio was
set up since kurt(f — ) < kurt(g — ).
What would have happened if we had implemented the K1 trade without know-
ing both SPD™s? Again, the answer to this question can only be indicated due
to the rare occurences of periods in which kurt(f — ) < kurt(g — ). Contrarily to
the S1 trade, the density comparison would have ¬ltered out a strongly nega-
tive net cash ¬‚ow that would have been generated by a portfolio set up in July
1998. But the signi¬cance of this feature is again uncertain.
About the K2 trade can only be said that without a SPD comparison it would
have procured heavy losses. The K2 trade applied as proposed can not be
216 9 Trading on Deviations of Implied and Historical Densities

evaluated completely since there was only one period in which kurt(f — ) <
kurt(g — ).

DAX 1997-1999
3000 Time
1/974/97 7/97 10/97 1/98 4/98 7/98 10/98 1/99 4/99 7/99 10/99

Figure 9.11. Evolution of DAX from January 1997 to December 1999

9.7 A Word of Caution
Interpreting the implied SPD as the SPD used by investors to price options, the
historical density as the ˜real™ underlyings™ SPD and assuming that no agent but
one know the underlyings™ SPD one should expect this agent to make higher
pro¬ts than all others due to its superior knowledge. That is why, exploiting
deviations of implied and historical density appears to be very promising at a
¬rst glance. Of course, if all market agents knew the underlyings™ SPD, both
f — would be equal to g — . In view of the high net cash ¬‚ows generated by both
skewness and kurtosis trades of type 1, it seems that not all agents are aware
of discrepancies in the third and fourth moment of both densities. However,
the strategies seem to be exposed to a substantial directional risk. Even if the
dataset contained bearish and bullish market phases, both trades have to be
tested on more extensive data. Considering the current political and economic
9.7 A Word of Caution 217

developments, it is not clear how these trades will perform being exposed to
˜peso risks™. Given that pro¬ts stem from highly positive cash ¬‚ows at portfolio
initiation, i.e. pro¬ts result from possibly mispriced options, who knows how
the pricing behavior of agents changes, how do agents assign probabilities to
future values of the underlying?
We measured performance in net EUR cash ¬‚ows. This approach does not
take risk into account as, for example the Sharpe ratio which is a measure of
the risk adjusted return of an investment. But to compute a return an initial
investment has to be done. However, in the simulation above, some portfolios
generated positive payo¬s both at initiation and at maturity. It is a challenge
for future research to ¬nd a way how to adjust for risk in such situations.
The SPD comparison yielded the same result for each period but one. The
implied SPD f — was in all but one period more negatively skewed than the time
series SPD g — . While g — was in all periods platykurtic, f — was in all but one
period leptokurtic. In this period the kurtosis of g — was slightly greater than
that of f — . Therefore, there was no alternating use of type 1 and type 2 trades.
But in more turbulent market environments such an approach might prove
useful. The procedure could be extended and ¬ne tuned by applying a density
distance measure as in Ait“Sahalia, Wang and Yared (2000) to give a signal
when to set up a portfolio either of type 1 of type 2. Furthermore, it is tempting
to modify the time series density estimation method such that the monte carlo
paths be simulated drawing random numbers not from a normal distribution
but from the distribution of the residuals resulting from the nonparametric
estimation of σF Z (•), H¨rdle and Yatchew (2001).

Ait“Sahalia, Y., Wang, Y. and Yared, F. (2001). Do Option Markets correctly
Price the Probabilities of Movement of the Underlying Asset?, Journal of
Econometrics 102: 67“110.
Barle, S. and Cakici, N., (1998). How to Grow a Smiling Tree, The Journal of
Financial Engineering 7: 127“146.
Black, F. and Scholes, M., (1998). The Pricing of Options and Corporate
Liabilities, Journal of Political Economy 81: 637“659.
218 9 Trading on Deviations of Implied and Historical Densities

Blaskowitz, O. (2001). Trading on Deviations of Implied and Historical Density,
Diploma Thesis, Humboldt“Universit¨t zu Berlin.
Breeden, D. and Litzenberger, R., (1978). Prices of State Contingent Claims
Implicit in Option Prices, Journal of Business, 9, 4: 621“651.
Cox, J., Ross, S. and Rubinstein, M. (1979). Option Pricing: A simpli¬ed
Approach, Journal of Financial Economics 7: 229“263.
Derman, E. and Kani, I. (1994). The Volatility Smile and Its Implied Tree,
Dupire, B. (1994). Pricing with a Smile, Risk 7: 18“20.
Florens“Zmirou, D. (1993). On Estimating the Di¬usion Coe¬cient from Dis-
crete Observations, Journal of Applied Probability 30: 790“804.
Franke, J., H¨rdle, W. and Hafner, C. (2001). Einf¨hrung in die Statistik der
a u
Finanzm¨rkte, Springer Verlag, Heidelberg.
H¨rdle, W. and Simar, L. (2002). Applied Multivariate Statistical Analysis,
Springer Verlag, Heidelberg.
H¨rdle, W. and Tsybakov, A., (1995). Local Polynomial Estimators of the
Volatility Function in Nonparametric Autoregression, Sonderforschungs-
bereich 373 Discussion Paper, Humboldt“Universit¨t zu Berlin.
H¨rdle, W. and Yatchew, A. (2001). Dynamic Nonparametric State Price
Density Estimation using Constrained Least Squares and the Bootstrap,
Sonderforschungsbereich 373 Discussion Paper, Humboldt“Universit¨t zu
H¨rdle, W. and Zheng, J. (2001). How Precise Are Price Distributions Predicted
by Implied Binomial Trees?, Sonderforschungsbereich 373 Discussion Pa-
per, Humboldt“Universit¨t zu Berlin.
Jackwerth, J.C. (1999). Option Implied Risk Neutral Distributions and Im-
plied Binomial Trees: A Literatur Review, The Journal of Derivatives
Winter: 66“82.
Kloeden, P., Platen, E. and Schurz, H. (1994). Numerical Solution of SDE
Through Computer Experiments, Springer Verlag, Heidelberg.
Rubinstein, M. (1994). Implied Binomial Trees, Journal of Finance 49: 771“
Part IV

10 Multivariate Volatility Models
Matthias R. Fengler and Helmut Herwartz

Multivariate volatility models are widely used in Finance to capture both
volatility clustering and contemporaneous correlation of asset return vectors.
Here we focus on multivariate GARCH models. In this common model class
it is assumed that the covariance of the error distribution follows a time de-
pendent process conditional on information which is generated by the history
of the process. To provide a particular example, we consider a system of ex-
change rates of two currencies measured against the US Dollar (USD), namely
the Deutsche Mark (DEM) and the British Pound Sterling (GBP). For this
process we compare the dynamic properties of the bivariate model with uni-
variate GARCH speci¬cations where cross sectional dependencies are ignored.
Moreover, we illustrate the scope of the bivariate model by ex-ante forecasts of
bivariate exchange rate densities.

10.1 Introduction
Volatility clustering, i.e. positive correlation of price variations observed on
speculative markets, motivated the introduction of autoregressive conditionally
heteroskedastic (ARCH) processes by Engle (1982) and its popular generaliza-
tions by Bollerslev (1986) (Generalized ARCH, GARCH) and Nelson (1991)
(exponential GARCH, EGARCH). Being univariate in nature, however, such
models neglect a further stylized fact of empirical price variations, namely con-
temporaneous cross correlation e.g. over a set of assets, stock market indices,
or exchange rates.
Cross section relationships are often implied by economic theory. Interest rate
parities, for instance, provide a close relation between domestic and foreign
bond rates. Assuming absence of arbitrage, the so-called triangular equation
formalizes the equality of an exchange rate between two currencies on the one
222 10 Multivariate Volatility Models

hand and an implied rate constructed via exchange rates measured towards a
third currency. Furthermore, stock prices of ¬rms acting on the same market
often show similar patterns in the sequel of news that are important for the
entire market (Hafner and Herwartz, 1998). Similarly, analyzing global volatil-
ity transmission Engle, Ito and Lin (1990) and Hamao, Masulis and Ng (1990)
found evidence in favor of volatility spillovers between the world™s major trad-
ing areas occurring in the sequel of ¬‚oor trading hours. From this point of view,
when modeling time varying volatilities, a multivariate model appears to be a
natural framework to take cross sectional information into account. Moreover,
the covariance between ¬nancial assets is of essential importance in ¬nance.
E¬ectively, many problems in ¬nancial practice like portfolio optimization,
hedging strategies, or Value-at-Risk evaluation require multivariate volatility
measures (Bollerslev et al., 1988; Cecchetti, Cumby and Figlewski, 1988).

10.1.1 Model speci¬cations

Let µt = (µ1t , µ2t , . . . , µN t ) denote an N -dimensional error process, which is
either directly observed or estimated from a multivariate regression model. The
process µt follows a multivariate GARCH process if it has the representation
µt = Σt ξt , (10.1)

where Σt is measurable with respect to information generated up to time t ’ 1,
denoted by the ¬ltration Ft’1 . By assumption the N components of ξt follow a
multivariate Gaussian distribution with mean zero and covariance matrix equal
to the identity matrix.
The conditional covariance matrix, Σt = E[µt µt |Ft’1 ], has typical elements
σij with σii , i = 1, . . . , N, denoting conditional variances and o¬-diagonal ele-
ments σij , i, j = 1, . . . , N, i = j, denoting conditional covariances. To make the
speci¬cation in (10.1) feasible a parametric description relating Σt to Ft’1 is
necessary. In a multivariate setting, however, dependencies of the second order
moments in Σt on Ft’1 become easily computationally intractable for practical
Let vech(A) denote the half-vectorization operator stacking the elements of a
quadratic (N — N )-matrix A from the main diagonal downwards in a 2 N (N +
1) dimensional column vector. Within the so-called vec-representation of the
10.1 Introduction 223

GARCH(p, q) model Σt is speci¬ed as follows:
q p
˜ ˜
vech(Σt ) = c + Ai vech(µt’i µt’i ) + Gi vech(Σt’i ). (10.2)
i=1 i=1

˜ ˜
In (10.2) the matrices Ai and Gi each contain {N (N + 1)/2}2 elements. Deter-
ministic covariance components are collected in c, a column vector of dimension
N (N + 1)/2. We consider in the following the case p = q = 1 since in applied
work the GARCH(1,1) model has turned out to be particularly useful to de-
scribe a wide variety of ¬nancial market data (Bollerslev, Engle and Nelson,
On the one hand the vec“model in (10.2) allows for a very general dynamic
structure of the multivariate volatility process. On the other hand this speci¬-
cation su¬ers from high dimensionality of the relevant parameter space, which
makes it almost intractable for empirical work. In addition, it might be cumber-
some in applied work to restrict the admissible parameter space such that the
implied matrices Σt , t = 1, . . . , T , are positive de¬nite. These issues motivated
a considerable variety of competing multivariate GARCH speci¬cations.
Prominent proposals reducing the dimensionality of (10.2) are the constant
correlation model (Bollerslev, 1990) and the diagonal model (Bollerslev et al.,
1988). Specifying diagonal elements of Σt both of these approaches assume the
absence of cross equation dynamics, i.e. the only dynamics are

σii,t = cii + ai µ2
i,t’1 + gi σii,t’1 , i = 1, . . . , N. (10.3)

To determine o¬-diagonal elements of Σt Bollerslev (1990) proposes a constant
contemporaneous correlation,

σij,t = ρij σii σjj , i, j = 1, . . . , N, (10.4)

whereas Bollerslev et al. (1988) introduce an ARMA-type dynamic structure
as in (10.3) for σij,t as well, i.e.

σij,t = cij + aij µi,t’1 µj,t’1 + gij σij,t’1 , i, j = 1, . . . , N. (10.5)

For the bivariate case (N = 2) with p = q = 1 the constant correlation model
contains only 7 parameters compared to 21 parameters encountered in the full
model (10.2). The diagonal model is speci¬ed with 9 parameters. The price
that both models pay for parsimonity is in ruling out cross equation dynamics as
allowed in the general vec-model. Positive de¬niteness of Σt is easily guaranteed
224 10 Multivariate Volatility Models

for the constant correlation model (|ρij | < 1), whereas the diagonal model
requires more complicated restrictions to provide positive de¬nite covariance
The so-called BEKK-model (named after Baba, Engle, Kraft and Kroner, 1990)
provides a richer dynamic structure compared to both restricted processes men-
tioned before. De¬ning N — N matrices Aik and Gik and an upper triangular
matrix C0 the BEKK“model reads in a general version as follows:
q p
Σt = C0 C0 + Aik µt’i µt’i Aik + Gik Σt’i Gik . (10.6)
k=1 i=1 k=1 i=1

If K = q = p = 1 and N = 2, the model in (10.6) contains 11 parameters and
implies the following dynamic model for typical elements of Σt :
= c11 + a2 µ2 22
σ11,t 11 1,t’1 + 2a11 a21 µ1,t’1 µ2,t’1 + a21 µ2,t’1
2 2
+ g11 σ11,t’1 + 2g11 g21 σ21,t’1 + g21 σ22,t’1 ,
c21 + a11 a22 µ2 2
σ21,t = 1,t’1 + (a21 a12 + a11 a22 )µ1,t’1 µ2,t’1 + a21 a22 µ2,t’1
+ g11 g22 σ11,t’1 + (g21 g12 + g11 g22 )σ12,t’1 + g21 g22 σ22,t’1 ,
c22 + a2 µ2 22
σ22,t = 12 1,t’1 + 2a12 a22 µ1,t’1 µ2,t’1 + a22 µ2,t’1
2 2
+ g12 σ11,t’1 + 2g12 g22 σ21,t’1 + g22 σ22,t’1 .
Compared to the diagonal model the BEKK“speci¬cation economizes on the
number of parameters by restricting the vec“model within and across equa-
tions. Since Aik and Gik are not required to be diagonal, the BEKK-model
is convenient to allow for cross dynamics of conditional covariances. The pa-
rameter K governs to which extent the general representation in (10.2) can be
approximated by a BEKK-type model. In the following we assume K = 1.
Note that in the bivariate case with K = p = q = 1 the BEKK-model contains
11 parameters. If K = 1 the matrices A11 and ’A11 , imply the same condi-
tional covariances. Thus, for uniqueness of the BEKK-representation a11 > 0
and g11 > 0 is assumed. Note that the right hand side of (10.6) involves only
quadratic terms and, hence, given convenient initial conditions, Σt is positive
de¬nite under the weak (su¬cient) condition that at least one of the matrices
C0 or Gik has full rank (Engle and Kroner, 1995).

10.1.2 Estimation of the BEKK-model

As in the univariate case the parameters of a multivariate GARCH model are
estimated by maximum likelihood (ML) optimizing numerically the Gaussian
10.2 An empirical illustration 225

log-likelihood function.
With f denoting the multivariate normal density, the contribution of a single
observation, lt , to the log-likelihood of a sample is given as:

ln{f (µt |Ft’1 )}
lt =
N 1 1
= ’ ln(2π) ’ ln(|Σt |) ’ µt Σ’1 µt .
2 2 2
Maximizing the log-likelihood, l = t=1 lt , requires nonlinear maximization
methods. Involving only ¬rst order derivatives the algorithm introduced by
Berndt, Hall, Hall, and Hausman (1974) is easily implemented and particularly
useful for the estimation of multivariate GARCH processes.
If the actual error distribution di¬ers from the multivariate normal, maximizing
the Gaussian log-likelihood has become popular as Quasi ML (QML) estima-
tion. In the multivariate framework, results for the asymptotic properties of
the (Q)ML-estimator have been derived recently. Jeantheau (1998) proves the
QML-estimator to be consistent under the main assumption that the consid-
ered multivariate process is strictly stationary and ergodic. Further assuming
¬niteness of moments of µt up to order eight, Comte and Lieberman (2000)
derive asymptotic normality of the QML-estimator. The asymptotic distribu-
tion of the rescaled QML-estimator is analogous to the univariate case and
discussed in Bollerslev and Wooldridge (1992).

10.2 An empirical illustration

10.2.1 Data description

We analyze daily quotes of two European currencies measured against the USD,
namely the DEM and the GBP. The sample period is December 31, 1979 to
April 1, 1994, covering T = 3720 observations. Note that a subperiod of our
sample has already been investigated by Bollerslev and Engle (1993) discussing
common features of volatility processes.
The data is provided in fx. The ¬rst column contains DEM/USD and
the second GBP/USD. In XploRe a preliminary statistical analysis is easily
done by the summarize command. Before inspecting the summary statis-
tics, we load the data, Rt , and take log di¬erences, µt = ln(Rt ) ’ ln(Rt’1 ).
XFGmvol01.xpl produces the following table:
226 10 Multivariate Volatility Models

[2,] " Minimum Maximum Mean Median Std.Error"
[3,] "-----------------------------------------------------------"
[4,] "DEM/USD -0.040125 0.031874 -4.7184e-06 0 0.0070936"
[5,] "GBP/USD -0.046682 0.038665 0.00011003 0 0.0069721"


Evidently, the empirical means of both processes are very close to zero (-4.72e-
06 and 1.10e-04, respectively). Also minimum, maximum and standard errors
are of similar size. First di¬erences of the respective log exchange rates are
shown in Figure 10.1. As is apparent from Figure 10.1, variations of exchange
rate returns exhibit an autoregressive pattern: Large returns in foreign ex-
change markets are followed by large returns of either sign. This is most obvious
in periods of excessive returns. Note that these volatility clusters tend to coin-
cide in both series. It is precisely this observation that justi¬es a multivariate
GARCH speci¬cation.

10.2.2 Estimating bivariate GARCH

{coeff, likest} = bigarch(theta,et)
estimates a bivariate GARCH model

The quantlet bigarch provides a fast algorithm to estimate the BEKK repre-
sentation of a bivariate GARCH(1,1) model. QML-estimation is implemented
by means of the BHHH-algorithm which minimizes the negative Gaussian log-
likelihood function. The algorithm employs analytical ¬rst order derivatives of
the log-likelihood function (L¨tkepohl, 1996) with respect to the 11-dimensional
vector of parameters containing the elements of C0 , A11 and G11 as given in
10.2 An empirical illustration 227


1980 1982 1984 1986 1988 1990 1992 1994


1980 1982 1984 1986 1988 1990 1992 1994
Figure 10.1. Foreign exchange rate data: returns.
228 10 Multivariate Volatility Models

The standard call is

{coeff, likest}=bigarch(theta, et),

where as input parameters we have initial values theta for the iteration algo-
rithm and the data set, e.g. ¬nancial returns, stored in et. The estimation
output is the vector coeff containing the stacked elements of the parameter
matrices C0 , A11 and G11 in (10.6) after numerical optimization of the Gaussian
log-likelihood function. Being an iterative procedure the algorithm requires to
determine suitable initial parameters theta. For the diagonal elements of the
matrices A11 and G11 values around 0.3 and 0.9 appear reasonable, since in uni-
variate GARCH(1,1) models parameter estimates for a1 and g1 in (10.3) often
take values around 0.32 = 0.09 and 0.81 = 0.92 . There is no clear guidance how
to determine initial values for o¬ diagonal elements of A11 or G11 . Therefore
it might be reasonable to try alternative initializations of these parameters.
Given an initialization of A11 and G11 the starting values for the elements in
C0 are immediately determined by the algorithm assuming the unconditional
covariance of µt to exist, Engle and Kroner (1995).
Given our example under investigation the bivariate GARCH estimation yields
as output:

Contents of coeff

[ 1,] 0.0011516
[ 2,] 0.00031009
[ 3,] 0.00075685
[ 4,] 0.28185
[ 5,] -0.057194
[ 6,] -0.050449
[ 7,] 0.29344
[ 8,] 0.93878
[ 9,] 0.025117
[10,] 0.027503
[11,] 0.9391

Contents of likest

[1,] -28599

10.2 An empirical illustration 229

The last number is the obtained minimum of the negative log-likelihood func-
tion. The vector coeff given ¬rst contains as ¬rst three elements the parame-
ters of the upper triangular matrix C0 , the following four belong to the ARCH
(A11 ) and the last four to the GARCH parameters (G11 ), i.e. for our model

Σt = C0 C0 + A11 µt’1 µt’1 A11 + G11 Σt’1 G11 (10.7)
stated again for convenience, we ¬nd the matrices C0 , A, G to be:

1.15 .31
C0 = 10’3 ,
0 .76
.282 ’.050 .939 .028
A11 = , G11 = . (10.8)
’.057 .293 .025 .939

10.2.3 Estimating the (co)variance processes

The (co)variance is obtained by sequentially calculating the di¬erence equation
(10.7) where we use the estimator for the unconditional covariance matrix as
initial value (Σ0 = E T E ). Here, the T — 2 vector E contains log-di¬erences
of our foreign exchange rate data. Estimating the covariance process is also
accomplished in the quantlet XFGmvol02.xpl and additionally provided in
We display the estimated variance and covariance processes in Figure 10.2. The
upper and the lower panel of Figure 10.2 show the variances of the DEM/USD
and GBP/USD returns respectively, whereas in the middle panel we see the co-
variance process. Except for a very short period in the beginning of our sample
the covariance is positive and of non-negligible size throughout. This is evi-
dence for cross sectional dependencies in currency markets which we mentioned
earlier to motivate multivariate GARCH models.
Instead of estimating the realized path of variances as shown above,
we could also use the estimated parameters to simulate volatility paths
( XFGmvol03.xpl).
230 10 Multivariate Volatility Models


1980 1982 1984 1986 1988 1990 1992 1994

5 10 15

1980 1982 1984 1986 1988 1990 1992 1994

20 30

1980 1982 1984 1986 1988 1990 1992 1994
Figure 10.2. Estimated variance and covariance processes, 105 Σt .
10.2 An empirical illustration 231

DEM/USD - Simulation
10 15 20

0 500 1000 1500 2000 2500 3000

10 15

0 500 1000 1500 2000 2500 3000

GBP/USD - Simulation
10 20 30 40

0 500 1000
1500 2000 2500 3000
Figure 10.3. Simulated variance and covariance processes, both bivari-
ate (blue) and univariate case (green), 105 Σt .
232 10 Multivariate Volatility Models

For this at each point in time an observation µt is drawn from a multivariate
normal distribution with variance Σt . Given these observations, Σt is updated
according to (10.7). Then, a new residual is drawn with covariance Σt+1 . We
apply this procedure for T = 3000. The results, displayed in the upper three
panels of Figure 10.3, show a similar pattern as the original process given in
Figure 10.2. For the lower two panels we generate two variance processes from
the same residuals ξt . In this case, however, we set o¬-diagonal parameters in
A11 and G11 to zero to illustrate how the unrestricted BEKK model incorpo-
rates cross equation dynamics. As can be seen, both approaches are convenient
to capture volatility clustering. Depending on the particular state of the sys-
tem, spillover e¬ects operating through conditional covariances, however, have
a considerable impact on the magnitude of conditional volatility.

10.3 Forecasting exchange rate densities
The preceding section illustrated how the GARCH model may be employed
e¬ectively to describe empirical price variations of foreign exchange rates. For
practical purposes, as for instance scenario analysis, VaR estimation (Chap-
ter 1), option pricing (Chapter 16), one is often interested in the future joint
density of a set of asset prices. Continuing the comparison of the univariate
and bivariate approach to model volatility dynamics of exchange rates it is
thus natural to investigate the properties of these speci¬cations in terms of
forecasting performance.
We implement an iterative forecasting scheme along the following lines: Given
the estimated univariate and bivariate volatility models and the corresponding
information sets Ft’1 , t = 1, . . . , T ’ 5 (Figure 10.2), we employ the identi-
¬ed data generating processes to simulate one-week-ahead forecasts of both
exchange rates. To get a reliable estimate of the future density we set the
number of simulations to 50000 for each initial scenario. This procedure yields
two bivariate samples of future exchange rates, one simulated under bivariate,
the other one simulated under univariate GARCH assumptions.
A review on the current state of evaluating competing density forecasts is of-
fered by Tay and Wallis (1990). Adopting a Bayesian perspective the common
approach is to compare the expected loss of actions evaluated under alterna-
tive density forecasts. In our pure time series framework, however, a particular
action is hardly available for forecast density comparisons. Alternatively one
could concentrate on statistics directly derived from the simulated densities,
10.3 Forecasting exchange rate densities 233

Time window J Success ratio SRJ
1980 1981 0.744
1982 1983 0.757
1984 1985 0.793
1986 1987 0.788
1988 1989 0.806
1990 1991 0.807
1992 1994/4 0.856

Table 10.1. Time varying frequencies of the bivariate GARCH model
outperforming the univariate one in terms of one-week-ahead forecasts
(success ratio)

such as ¬rst and second order moments or even quantiles. Due to the mul-
tivariate nature of the time series under consideration it is a nontrivial issue
to rank alternative density forecasts in terms of these statistics. Therefore,
we regard a particular volatility model to be superior to another if it provides
a higher simulated density estimate of the actual bivariate future exchange
rate. This is accomplished by evaluating both densities at the actually realized
exchange rate obtained from a bivariate kernel estimation. Since the latter
comparison might su¬er from di¬erent unconditional variances under univari-
ate and multivariate volatility, the two simulated densities were rescaled to
have identical variance. Performing the latter forecasting exercises iteratively
over 3714 time points we can test if the bivariate volatility model outperforms
the univariate one.
To formalize the latter ideas we de¬ne a success ratio SRJ as
1 ˆ ˆ
SRJ = 1{fbiv (Rt+5 ) > funi (Rt+5 )}, (10.9)

where J denotes a time window containing |J| observations and 1 an indica-
ˆ ˆ
tor function. fbiv (Rt+5 ) and funi (Rt+5 ) are the estimated densities of future
exchange rates, which are simulated by the bivariate and univariate GARCH
processes, respectively, and which are evaluated at the actual exchange rate
levels Rt+5 . The simulations are performed in XFGmvol04.xpl.
Our results show that the bivariate model indeed outperforms the univariate
one when both likelihoods are compared under the actual realizations of the
exchange rate process. In 81.6% of all cases across the sample period, SRJ =
0.816, J = {t : t = 1, ..., T ’5}, the bivariate model provides a better forecast.
234 10 Multivariate Volatility Models

Covariance and success ratio

1980 1982 1984 1986 1988 1990 1992 1994
Figure 10.4. Estimated covariance process from the bivariate GARCH
model (104 σ12 , blue) and success ratio over overlapping time intervals
with window length 80 days (red).

This is highly signi¬cant. In Table 10.1 we show that the overall superiority of
the bivariate volatility approach is con¬rmed when considering subsamples of
two-years length. A-priori one may expect the bivariate model to outperform
the univariate one the larger (in absolute value) the covariance between both
return processes is. To verify this argument we display in Figure 10.4 the
empirical covariance estimates from Figure 10.2 jointly with the success ratio
evaluated over overlapping time intervals of length |J| = 80.
As is apparent from Figure 10.4 there is a close co-movement between the
success ratio and the general trend of the covariance process, which con¬rms
our expectations: the forecasting power of the bivariate GARCH model is
10.3 Forecasting exchange rate densities 235

particularly strong in periods where the DEM/USD and GBP/USD exchange
rate returns exhibit a high covariance. For completeness it is worthwhile to
mention that similar results are obtained if the window width is varied over
reasonable choices of |J| ranging from 40 to 150.
With respect to ¬nancial practice and research we take our results as strong
support for a multivariate approach towards asset price modeling. Whenever
contemporaneous correlation across markets matters, the system approach of-
fers essential advantages. To name a few areas of interest multivariate volatil-
ity models are supposed to yield useful insights for risk management, scenario
analysis and option pricing.

Baba, Y., Engle, R.F., Kraft, D.F., and Kroner, K.F. (1990). Multivariate Si-
multaneous Generalized ARCH, mimeo, Department of Economics, Uni-
versity of California, San Diego.
Berndt, E.K., Hall B.H., Hall, R.E., and Hausman, J.A. (1974). Estimation
and Inference in Nonlinear Structural Models, Annals of Economic and
Social Measurement 3/4: 653“665.
Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroscedastic-
ity, Journal of Econometrics 31: 307-327.
Bollerslev, T. (1990). Modeling the Coherence in Short-Run Nominal Exchange
Rates: A Multivariate Generalized ARCH Approach, Review of Economics
and Statistics 72: 498“505.
Bollerslev, T. and Engle, R.F. (1993). Common Persistence in Conditional
Variances, Econometrica 61: 167“186.
Bollerslev, T., Engle, R.F. and Nelson, D.B. (1994). GARCH Models, in: En-
gle, R.F., and McFadden, D.L. (eds.) Handbook of Econometrics, Vol. 4,
Elsevier, Amsterdam, 2961“3038.
Bollerslev, T., Engle, R.F. and Wooldridge, J.M. (1988). A Capital Asset Pric-
ing Model with Time-Varying Covariances, Journal of Political Economy
96: 116“131.
236 10 Multivariate Volatility Models

Bollerslev, T. and Wooldridge, J.M. (1992). Quasi“Maximum Likelihood Esti-
mation and Inference in Dynamic Models with Time“Varying Covariances,
Econometric Reviews, 11: 143“172.
Cecchetti, S.G., Cumby, R.E. and Figlewski, S. (1988). Estimation of the Op-
timal Futures Hedge, Review of Economics and Statistics 70: 623-630.
Comte, F. and Lieberman, O. (2000). Asymptotic Theory for Multivariate
GARCH Processes, Manuscript, Universities Paris 6 and Paris 7.
Engle, R.F. (1982). Autoregressive Conditional Heteroscedasticity with Esti-
mates of the Variance of UK In¬‚ation. Econometrica 50: 987-1008.

Engle, R.F., Ito, T. and Lin, W.L. (1990). Meteor Showers or Heat Waves?
Heteroskedastic Intra-Daily Volatility in the Foreign Exchange Market,
Econometrica 58: 525“542.
Engle, R.F. and Kroner, K.F. (1995). Multivariate Simultaneous Generalized
ARCH, Econometric Theory 11: 122“150.
Hafner, C.M. and Herwartz, H. (1998). Structural Analysis of Portfolio Risk
using Beta Impulse Response Functions, Statistica Neerlandica 52: 336-
Hamao, Y., Masulis, R.W. and Ng, V.K. (1990). Correlations in Price Changes
and Volatility across International Stock Markets, Review of Financial
Studies 3: 281“307.
Jeantheau, T. (1998). Strong Consistency of Estimators for Multivariate ARCH
Models, Econometric Theory 14: 70-86.
L¨tkepohl, H. (1996). Handbook of Matrices, Wiley, Chichester.
Nelson, D.B. (1991). Conditional Heteroskedasticity in Asset Returns: A New
Approach, Econometrica 59: 347“370.
Tay, A. and Wallis, K. (2000). Density forecasting: A Survey, Journal of Fore-
casting 19: 235“254.
11 Statistical Process Control
Sven Knoth

Statistical Process Control (SPC) is the misleading title of the area of statistics
which is concerned with the statistical monitoring of sequentially observed data.
Together with the theory of sampling plans, capability analysis and similar
topics it forms the ¬eld of Statistical Quality Control. SPC started in the
1930s with the pioneering work of Shewhart (1931). Then, SPC became very
popular with the introduction of new quality policies in the industries of Japan
and of the USA. Nowadays, SPC methods are considered not only in industrial
statistics. In ¬nance, medicine, environmental statistics, and in other ¬elds of
applications practitioners and statisticians use and investigate SPC methods.
A SPC scheme “ in industry mostly called control chart “ is a sequential scheme
for detecting the so called change point in the sequence of observed data. Here,
we consider the most simple case. All observations X1 , X2 , . . . are independent,
normally distributed with known variance σ 2 . Up to an unknown time point
m ’ 1 the expectation of the Xi is equal to µ0 , starting with the change point
m the expectation is switched to µ1 = µ0 . While both expectation values
are known, the change point m is unknown. Now, based on the sequentially
observed data the SPC scheme has to detect whether a change occurred.
SPC schemes can be described by a stopping time L “ known as run length “
which is adapted to the sequence of sigma algebras Fn = F(X1 , X2 , . . . , Xn ).
The performance or power of these schemes is usually measured by the Average
Run Length (ARL), the expectation of L. The ARL denotes the average num-
ber of observations until the SPC scheme signals. We distinguish false alarms
“ the scheme signals before m, i. e. before the change actually took place “ and
right ones. A suitable scheme provides large ARLs for m = ∞ and small ARLs
for m = 1. In case of 1 < m < ∞ one has to consider further performance
measures. In the case of the oldest schemes “ the Shewhart charts “ the typical
inference characteristics like the error probabilities were ¬rstly used.
238 11 Statistical Process Control

The chapter is organized as follows. In Section 11.1 the charts in consider-
ation are introduced and their graphical representation is demonstrated. In
the Section 11.2 the most popular chart characteristics are described. First,
the characteristics as the ARL and the Average Delay (AD) are de¬ned. These
performance measures are used for the setup of the applied SPC scheme. Then,
the three subsections of Section 11.2 are concerned with the usage of the SPC
routines for determination of the ARL, the AD, and the probability mass func-
tion (PMF) of the run length. In Section 11.3 some results of two papers are
reproduced with the corresponding XploRe quantlets.

11.1 Control Charts
Recall that the data X1 , X2 , . . . follow the change point model
Xt ∼ N (µ0 , σ 2 ) , t = 1, 2, . . . , m ’ 1
. (11.1)
Xt ∼ N (µ1 = µ0 , σ 2 ) , t = m, m + 1, . . .
The observations are independent and the time point m is unknown. The
control chart (the SPC scheme) corresponds to a stopping time L. Here we
consider three di¬erent schemes “ the Shewhart chart, EWMA and CUSUM
schemes. There are one- and two-sided versions. The related stopping times in
the one-sided upper versions are:

1. The Shewhart chart introduced by Shewhart (1931)
Xt ’ µ0
LShewhart = inf t ∈ I : Zt =
N > c1 (11.2)
with the design parameter c1 called critical value.
2. The EWMA scheme (exponentially weighted moving average) initially
presented by Roberts (1959)

LEWMA = inf t ∈ I : Zt
»/(2 ’ ») ,
N > c2 (11.3)
Z0 = z0 = 0 ,
Xt ’ µ0
= (1 ’ ») Zt’1
Zt +» , t = 1, 2, . . . (11.4)
with the smoothing value » and the critical value c2 . The smaller » the
faster EWMA detects small µ1 ’ µ0 > 0.
11.1 Control Charts 239

3. The CUSUM scheme (cumulative sum) introduced by Page (1954)

LCUSUM = inf t ∈ I : Zt
N > c3 , (11.5)
Z0 = z0 = 0 ,
Xt ’ µ0
Zt = max 0, Zt’1 + , t = 1, 2, . . . (11.6)

with the reference value k and the critical value c3 (known as decision
interval). For fastest detection of µ1 ’ µ0 CUSUM has to be set up with
k = (µ1 + µ0 )/(2 σ).

The above notation uses normalized data. Thus, it is not important whether
Xt is a single observation or a sample statistic as the empirical mean.
Remark, that for using one-sided lower schemes one has to apply the upper
schemes to the data multiplied with -1. A slight modi¬cation of one-sided
Shewhart and EWMA charts leads to their two-sided versions. One has to
replace in the comparison of chart statistic and threshold the original statistic
Zt and Zt by their absolute value. The two-sided versions of these schemes


. 8
( 14)