Central Bank of Denmark repo rate increase by 0.6 percentage points (from 4.1% to

4.7% on 9 June 2000), following the ECB decision to increase the re¬ rate to 4.25%

from 3.75%.

After the referendum, the Central Bank of Denmark increased the repo rate to 5.6%,

from 5.1%. This move was followed by a sharp decrease of the 1-year minimum width

of the perfectly credible band, from 1.9% to 1%. Then, the market seemed to expect that,

notwithstanding the referendum result, the Danish monetary policy would continue to be

anchored to the ECB policy, within the ERM II target zone.

The violation of the full credibility condition contrasts with the evolution of the

EUR/DKK exchange rate, whose spot rate kept moving in a quite narrow band, while the

forward rate was always far from the band ceiling, accomplishing the Svensson (1991)

simple credibility test (Figure 12.23).57

The ¬les used for the estimations illustrated in Figures 12.20“12.22 are “credibility

tests PTE 3m.xls”, “credibility tests DKK 1m.xls” and “credibility tests DKK 1y.xls”

and are also included on the CD-Rom. All these ¬les have a similar structure: the ¬rst three

columns contain the data on the interest rates and the spot exchange rates; the forward

exchange rates are computed in the fourth column, while the bounds of the ERM-EMS are

in the two following columns; the implied (offer) volatilities and the respective call-option

prices are presented next; in the following column the minimum intensity of realignment

is computed, as in equation (12.40); the credibility tests in equations (12.34) and (12.36)

are performed next; ¬nally, the minimum width of the fully credible band in equation

(12.37) is calculated.

7.65

7.60

7.55

7.50

7.45

7.40

7.35

7.30

7.25

7.20

Apr-99

Oct-99

Apr-00

Mar-99

Mar-00

Jan-00

Jun-99

Jun-00

Nov-99

Feb-99

Feb-00

Dec-99

Jan-99

Sep-99

Sep-00

May-99

Jul-99

May-00

Jul-00

Aug-99

Aug-00

spot 1-month forward

1-year forward band ceiling

band floor

Figure 12.23 EUR/DKK Spot and forward rates

57

This test corresponds to assessing whether the forward rate exceeds the target zone limits.

378 Applied Quantitative Methods for Trading and Investment

12.7 CONCLUSIONS

In the OTC currency option market, ¬nancial institutions quote regularly at-the-money

forward volatilities, as well as risk-reversals and strangles, which provide information

respectively on the skewness and kurtosis of the distribution of the underlying asset. The

informational content of these prices is supposedly higher than that from exchange traded

option prices concerning currency options, as in this case the over-the-counter market

involves a signi¬cantly higher transactions volume. Besides, using only this information,

it is possible to estimate the whole RND function, as well as to test the credibility of a

target zone.

Currency option prices also provide relevant information on implied correlations

between exchange rates, that can be necessary for portfolio management or risk

assessments. Compared to the usual conditional correlations, the former offer the

advantage of being forward looking, allowing one to obtain correlation measures from

market data in periods in which regime shifts are expected. Considering the EMU, implied

correlations permit market expectations to be assessed on the degree of convergence

between potential participating currencies.

Three empirical applications of the literature were presented: ¬rstly on the EUR/USD,

followed by the analysis of implied correlations and the credibility of the Portuguese

exchange rate policy, during the transition to the EMU, and of the Danish exchange rate

policy around the euro referendum in September 2000.

As in many other cases found in the literature, the euro depreciation vis-` -vis the

a

US dollar has occurred against the UIP predictions. Simultaneously, market expectations

about the future behaviour of the exchange rate also changed markedly, which is re¬‚ected

by OTC quotes for EUR/USD options. The results obtained con¬rm the general assertion

that the expectations on the euro evolution deteriorated from January 1999 to October

2000, though some improvements happened after the joint exchange rate intervention in

September 2000.

Regarding the credibility of the Portuguese exchange rate policy, the results are consis-

tent with the idea that the probability of the escudo™s participation in the euro area from

the beginning increased persistently over the course of 1997. Additionally, the market

attributed a high degree of credibility to the bilateral central parity announced at the 1“3

May 1998 European summit.

On the eve of the summit, market participants were expecting a perfect correlation

between the movements of the Deutsche Mark and the Portuguese escudo, vis-` -vis thea

US dollar. Furthermore, the central moments of the PTE/DEM exchange rate RND func-

tion converged to the central parity and the uncertainty regarding the future values of

PTE/DEM decreased strongly.

Though the Danish crown and the euro kept almost perfectly correlated during the ¬rst

three quarters of 2000, the credibility of the ERM target zone was challenged by option

market participants before the referendum, mainly in the beginning of July. This change

in market expectations followed the widening of the gap between the key interest rates of

the Danish crown and the euro. After the referendum, though the integration in the euro

was rejected, the credibility of the Danish monetary policy was enhanced, re¬‚ected in the

decrease of the minimum width of the fully credible band.

Portfolio Management and Information 379

REFERENCES

Abken, P. A. (1995), “Using Eurodollar Futures and Options: Gauging the Market™s View of Interest

Rate Movements”, Federal Reserve Bank of Atlanta Economic Review, Mar/Apr, 10“30.

Bahra, B. (1996), “Implied Risk-Neutral Probability Density Functions from Option Prices: Theory

and Application”, Bank of England Economic Bulletin, Aug, 299“309.

Ball, C. and W. N. Torous (1983), “A Simpli¬ed Jump Process for Common Stock Returns”, Jour-

nal of Financial and Quantitative Analysis, 18, March, 53“65.

Ball, C. and W. N. Torous (1985), “On Jumps in Common Stock Prices and Their Impact on Call

Option Pricing”, Journal of Finance, XL(1), March, 155“173.

Bates, D. S. (1991), “The Crash of ™87: Was It Expected? The Evidence from Options Markets”,

Journal of Finance, XLVI(3), July, 1009“1044.

BIS (2001), “The Global OTC Derivatives Market at end-June 2001: Second Part of the Triennial

Central Bank Survey of Foreign Exchange and Derivatives Market Activity”.

Black, F. and M. Scholes (1973), “The Pricing of Options and Corporate Liabilities”, Journal of

Political Economy, 81 (May/Jun), 637“654.

Bliss, R. R. and N. Panigirtzoglou (2000), “Testing the Stability of Implied Probability Density

Functions”, Bank of England Working Paper, No. 114.

Breeden, D. T. and R. H. Litzenberger (1978), “Prices of State-Contingent Claims Implicit in

Option Prices”, Journal of Business, 51(4), October, 621“651.

Butler, C. and H. Davies (1998), “Assessing Market Views on Monetary Policy: The Use of Implied

Risk-Neutral Probability Distributions”, paper presented to the BIS/CEPR Conference “Asset

Prices and Monetary Policy”, January.

Campa, J. M. and P. H. K. Chang (1995), “Testing the Expectations Hypothesis on the Term Struc-

ture of Volatilities in Foreign Exchange Options”, Journal of Finance, 50, 529“547.

Campa, J. M. and P. H. K. Chang (1996), “Arbitrage-based Tests of Target-Zone Credibility: Evi-

dence from ERM Cross-Rate Options”, American Economic Review, 86(4), September.

Campa, J. M., P. H. K. Chang and R. L. Reider (1997), “ERM Bandwidths for EMU and After:

Evidence from Foreign Exchange Options”, Economic Policy, 24, April, 55“89.

Clews, R., N. Panigirtzoglou and J. Proudman (2000), “Recent Developments in Extracting Infor-

mation from Option Markets”, Bank of England Quarterly Bulletin, Feb, 50“60.

Deutsche Bundesbank (1995), “The Information Content of Derivatives for Monetary Policy”,

Deutsche Bundesbank Monthly Report, Nov.

Dunis, C. and P. Lequeux (2001), “The Information Content of Risk-Reversals”, Derivatives Uses,

Trading and Regulation, 2, 98“117.

Garman, M. M. and S. W. Kohlhagen (1983), “Foreign Currency Option Values”, Journal of Inter-

national Money and Finance, 2, December, 231“237.

Loretan, M. and W. B. English (2000), “Evaluating Correlation Breakdowns during Periods of

Market Volatility”, BIS Conference Papers, Vol. 8, Bank for International Settlements, March.

Lu´s, J. B. (2001), “Essays on Extracting Information from Financial Asset Prices”, PhD thesis,

±

University of York.

Malz, A. M. (1996), “Using Option Prices to Estimate Realignment Probabilities in the European

Monetary System: the Case of Sterling“Mark”, Journal of International Money and Finance,

15(5), 717“748.

Malz, A. M. (1997), “Estimating the Probability Distribution of the Future Exchange Rate from

Option Prices”, Journal of Derivatives, Winter, 18“36.

McCauley, R. and W. Melick (1996), “Risk Reversal Risk”, Risk, 9(11), November, 54“57.

Melick, W. R. and C. P. Thomas (1997), “Recovering an Asset™s Implied PDF from Option Prices:

An Application to Crude Oil during the Gulf Crisis”, Journal of Financial and Quantitative

Analysis, 32(1), March, 91“115.

Melick, W. R. and C. P. Thomas (1998), “Con¬dence Intervals and Constant-Maturity Series for

Probability Measures Extracted from Options Prices”, paper presented to the Bank of Canada

Conference “Information in Financial Asset Prices”, May.

380 Applied Quantitative Methods for Trading and Investment

Merton, R. C. (1976), “Option Pricing when Underlying Stock Returns are Discontinuous”, Journal

of Financial Economics, Jan/Mar(3), 125“144.

Nelson, C. R. and A. F. Siegel (1987), “Parsimonious Modelling of Yield Curves”, Journal of

Business, 60(4), 473“489.

Ritchey, R. R. (1990), “Call Option Valuation for Discrete Normal Mixtures”, The Journal of Finan-

cial Research, XIII(4), Winter.

S¨ derlind, P. and L. E. O. Svensson (1997), “New Techniques to Extract Market Expectations from

o

Financial Instruments”, Journal of Monetary Economics, 40, 383“429.

Svensson, L. E. O. (1991), “The Simplest Test of Target Zone Credibility”, International Monetary

Fund Staff Papers, 38, 655“665.

Svensson, L. E. O. (1994), “Estimating and Interpreting Forward Interest Rates: Sweden 1992“4”,

CEPR Discussion Paper Series No. 1051.

Xu, X. and S. J. Taylor (1994), “The Term Structure of Volatility Implied by Foreign Exchange

Options”, Journal of Financial and Quantitative Analysis, 29(1), March.

13

Filling Analysis for Missing Data: An

Application to Weather Risk Management—

CHRISTIAN L. DUNIS AND VASSILIOS KARALIS

ABSTRACT

This chapter examines and analyses the use of alternative methods when confronted with

missing data, a common problem when not enough historical data or clean historical data

exist: this will typically be the case when trying to develop a decision tool either for a

new asset in a given asset class (say a recently issued stock in a given company sector)

or for a new asset class as such (for instance weather derivatives).

Weather derivatives are used to illustrate this analysis because most weather derivatives

pricing methodologies rely heavily on “clean” weather temperature data. The methodol-

ogy adopted is to discard certain observed values and treat them as missing data. We

then examine and analyse the imputation accuracy of different interpolation techniques

and ¬lling methods for missing historical records of temperature data: the Expectation

Maximisation (EM) algorithm, the Data Augmentation (DA) algorithm, the Kalman ¬l-

ter (KF), Neural Networks Regression (NNR) models and, ¬nally, Principal Component

Analysis (PCA).

These methodologies are benchmarked against simpler techniques like the fallback

methodology and a na¨ve approach whereby missing temperature observations are imputed

±

with the same day value of the previous year. Their performance is evaluated against the

actual values of the “missing data” using standard measures of forecasting accuracy widely

used in the economic literature.

Overall, it is found that, for the periods and the data series concerned, the results of

PCA outperformed the other methodologies in all cases of missing observations analysed.

13.1 INTRODUCTION

This entire book has been devoted to answering this apparently simple question: with

the help of different methods, what can we learn about the future from the ¬nancial

data available from the past? In other words, we have taken data availability for granted.

—

The opinions expressed herein are those of the authors, and not necessarily those of Girobank.

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m

±

™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5

382 Applied Quantitative Methods for Trading and Investment

But the absence of data, or of clean data, is a rather common problem facing users of

quantitative methods in ¬nancial markets.

Admittedly, in recent years, the increasing availability of powerful computers has

made it ever easier to collect, store and process higher frequency databanks, at a time

when the use of advanced technologies for trading, risk and asset management has

grown exponentially among ¬nancial institutions. Nevertheless, in a multivariate con-

text, taking into account the fact that most ¬nancial markets do not move indepen-

dently from each other, the nonsynchronicity of such high frequency data arrival clearly

poses major practical problems. Similarly, even at daily frequencies, a risk manager

will possibly face a dilemma over which data to retain for the marking-to-market and

Value-at-Risk (VaR) computation of a portfolio containing assets from different mar-

kets when some of these are not trading either because of the time difference or a

public holiday.

Such problems will be exacerbated when not enough historical data or clean historical

data exist: this will typically be the case when trying to develop a decision tool either for

a new asset in a given asset class (say a recently issued stock in a given company sector)

or for a new asset class as such (for instance weather derivatives).

The motivation for this chapter is to consider and examine the problem of missing data.

Weather data are used to illustrate this analysis because most weather derivatives pric-

ing methodologies rely heavily on “clean” weather temperature data. The methodology

adopted is to discard certain observed values and treat them as “missing data”. We then

examine and analyse the imputation accuracy of different interpolation techniques and ¬ll-

ing methods for missing historical records of temperature data. These are the Expectation

Maximisation (EM) algorithm, which is a common technique for determining maximum-

likelihood estimates, the Data Augmentation (DA) algorithm, an iterative simulation very

similar to the EM algorithm and which may be regarded as a stochastic edition of the

EM, the Kalman ¬lter (KF), a smoothing algorithm which can be used to estimate missing

observations once a model has been represented in state space form, Principal Component

Analysis (PCA), an approach adapted to ¬ll in appropriate values for missing observa-

tions from data that are available from correlated variables and, ¬nally, Neural Networks

Regression (NNR) models which have been used here to identify and possibly bene¬t

from nonlinear correlations between temperature series.

These methodologies are benchmarked against simpler and widely used techniques like

the fallback methodology and a na¨ve approach whereby missing temperature observa-

±

tions are imputed with the same day value of the previous year. This benchmarking

is carried out to gauge the potential added value of the different methods retained

in the imputation process. The performance of these techniques as predictors of the

“pseudo missing” values is evaluated using standard measures of forecasting accuracy

widely used in the economic literature, such as mean absolute error, root mean squared

error, etc.

Overall, we conclude that, for the periods and the data series concerned, the results of

PCA outperformed the other methodologies in all cases of missing observations analysed.

The rest of the chapter is organised as follows. Section 13.2 gives a brief overview of

the importance of clean weather data for weather derivatives pricing and discusses the

weather data used in this research. Section 13.3 documents the different ¬lling methods

retained while Section 13.4 presents our empirical results. Finally, Section 13.5 concludes

this chapter.

Filling Analysis for Missing Data 383

13.2 WEATHER DATA AND WEATHER DERIVATIVES

13.2.1 The importance of weather data for weather derivatives pricing

The history of the weather derivatives market dates back to 1996, when electricity

deregulation in the USA caused the power market to begin changing from series of local

monopolies to competitive regional wholesale markets. Energy companies realising the

impact of weather on their operations took control of their weather risk and created a new

market around it. While a number of pioneering trades were made as early as 1996, the

market did not really take off until after September 1999, when the Chicago Mercantile

Exchange (CME) embarked on listing and trading standard futures and options contracts

on US temperature indexes. Since 2000, the weather risk market has grown signi¬cantly

and become progressively more diversi¬ed across industries, even if this market originally

evolved from the energy sector. According to the Second Annual Weather Risk Industry

survey commissioned by the Weather Risk Management Association (WRMA), the total

market size for weather derivatives increased to an impressive $11.5 billion at the end of

2001 (see PricewaterhouseCoopers (2002)).

Most weather derivatives contracts traded were temperature related (over 82%). These

are “heating degree days” (HDDs) and “cooling degree days” (CDDs) contracts. A degree

day is the deviation of the average daily temperature (ADT) from a prede¬ned temperature

de¬ned as K in the following equations. HDDs and CDDs are the most common degree

day measurements. The temperature below K represents the temperature below which

heating devices are expected to turn on (HDDs), and above which air conditioners are

expected to turn on (CDDs). Since HDDs and CDDs are measuring heating and cooling

needs compared to the base temperature, they are calculated according to the follow-

ing equations:

Tmax + Tmin

Daily HDDs = max(0, (K ’ ADT)) = max 0, K ’

2

Tmax + Tmin

Daily CDDs = max(0, (ADT ’ K)) = max 0, ’K

2

where Tmax and Tmin are the maximum and minimum, respectively, recorded temperature

during the day.1 The calculation of daily degree days for longer periods (week, season,

year) is based on straightforward summation:

n

Tmax + Tmin

HDDs = max 0, K ’

2

i

n

Tmax + Tmin

CDDs = ’K

max 0,

2

i

In order to demonstrate how a degree day index may be structured to moderate risk,

suppose that an energy company through the past 10 years™ supply and temperature

In the USA, the standard baseline temperature (K) is 65—¦ F (18.3—¦ C), but the 65—¦ F base temperature can

1

¬‚uctuate from one region to another.

384 Applied Quantitative Methods for Trading and Investment

analysis has determined the expected cumulative number of HDDs during the heating

season. The company is expecting to sell a given amount of energy units according to

this number and sets its budgets to this level. Its analysis also suggests that, for each HDD

below the 10-year average, demand will decrease by a set number of energy units, creating

a marginal loss of $55 000 versus budget. In such a case the company may purchase a

HDD index derivative to achieve protection at a price of $55 000 for each HDD lower

than a given strike (see Corbally and Dang (2001)).

Traditionally, ¬nancial derivatives such as options on equities, bonds, forex or com-

modities are priced using no-arbitrage models such as the Black“Scholes pricing model.

In the case of weather derivatives, the underlying asset is a physical quantity (tempera-

ture, rain, wind or snow) rather than a traded asset. Given that the underlying weather

indexes are not traded, a no-arbitrage model cannot be directly applied to price weather

derivatives.

Another approach is used in the insurance industry, known as Historical Burn Analysis

(HBA). The central assumption of the method is that the historical record of weather

contracts payouts gives a precise illustration of the distribution of potential payouts. As

noted by Henderson (2001), if weather risk is calculated as the payouts standard deviation,

then the price of the contract will be given by the equation:

Pricebid/offer (t) = D(t, T ) — (µ ± ±σ )

where D(t, T ) is the discount factor from contract maturity T to the pricing time t, µ is

the historical average payout, σ is the historical standard deviation of payouts and ± is a

positive number denoting the protection seller™s risk tolerance.

HBA therefore crucially depends on historical temperature data. Collecting historical

data may be somewhat dif¬cult and costly but, even when the data is available, there are

several types of errors like missing data or unreasonable readings. Consequently, the data

must be “cleaned” “ that is, the errors and omissions must be ¬xed “ in order to be used

for pricing and risk management purposes.

13.2.2 The weather databank

For this research, we use daily temperature data (cleaned data) for Philadelphia Inter-

national (WMO: 724080) weather station (the index station) and for its “fallback station”

according to the WRMA, the Allentown-Bethlehem (WMO: 725170).2 This data was

obtained from QuantWeather (www.quantweather.com). Furthermore, to perform PCA

and NNR, we use temperature data for all the neighbouring stations for which cleaned

data are available. In order to create a series of consequent daily temperature observations

for Tmax (maximum daily temperature), Tmin (minimum daily temperature) and the cor-

responding Tavg ((Tmax + Tmin )/2) for all the available stations, the databank spans from

1 July 1960 to 31 December 1993 (i.e. 33 years and 12 237 observations). Moreover, to

examine the impact of seasonality, this databank was divided into two separate subsets.

The ¬rst one includes only the “Autumn” (September, October and November) tempera-

ture observations and the second one only November temperature observations, totalling

2

Weather stations are identi¬ed in several ways. The most common is a six-digit WMO ID assigned by the

World Meteorological Organisation. WMO IDs are the primary means of identifying weather stations outside

the USA, while in the USA the ¬ve-digit Weather Bureau Army Navy identities (WBAN IDs) are widely used.

Filling Analysis for Missing Data 385

3094 and 1020 observations respectively. Standard tests like the serial correlation LM test,

ADF, Phillips“Perron, ARCH-LM and Jarque“Bera tests (not reported here in order to

conserve space) showed that all the series are autocorrelated, stationary, heteroskedastic

and not normally distributed.

In order to assess the imputation accuracy of the methodologies, several “holes” were

created in the datasets. Arbitrarily, it is assumed that all the missing observations occurred

in November 1993. We also supposed that the missing data were missing at random. The

fallback methodology employs different procedures to handle missing data according to

the number of consecutive missing days, with 12 days or more an important breakpoint.3

Therefore, ¬ve different gaps were created including 1, 7, 10 (<12), 20 and 30 (>12)

consecutive missing days. Since HDDs and CDDs are calculated as the deviation of the

average daily temperature from the standard baseline temperature and, again, in order

to conserve space only the imputed values of the average temperature dataset (Tavg ) are

presented and compared later in this chapter.

13.3 ALTERNATIVE FILLING METHODS FOR MISSING DATA

In order to ensure that more complex methodologies do indeed add value in the imputation

process, it is essential to benchmark them against simpler and more widely used methods

like the na¨ve approach and the fallback method. We ¬rst present these benchmarks before

±

turning to more sophisticated methods.

13.3.1 The na¨ve approach

±

The na¨ve hypothesis retained is to impute the missing temperature observation with the

±

same day value of the previous year. This simple model is formed by:

Yt = Y —

where Yt is the missing temperature data at period t and Y — the corresponding imputed

value, i.e. the temperature that prevailed on the same day the previous year.

13.3.2 The fallback methodology

The second part of the WRMA con¬rmation4 establishes the weather industry™s standard

for determining and adjusting weather data. It presents an estimation of temperature

based on a weather index station, which is usually an agreed region designated in the

con¬rmation by an airport location, WBAN or WMO ID. In the event that weather data

is not available for the index station, the WRMA con¬rmation sets forth the interpolation

procedures to be used to determine the missing data under the provision entitled “fallback

methodology” (see Rasp´ (2001), p. 233). The fallback method provides an adjustment

e

based on temperature at a selected alternative weather index station referred to as the

“fallback station”.

3

See Section 13.3.2 below.

4

Standard forms of documents and other con¬rming evidence exchanged between the parties in a weather

contract.

386 Applied Quantitative Methods for Trading and Investment

13.3.2.1 Fallback terminology

It is important at this point to de¬ne some basic terminology of the fallback methodology:

• Adjustment indicates the mean of the arithmetic difference of the daily maximum and/or

minimum temperature (if the unavailable temperature is a daily maximum and/or mini-

mum respectively) at the fallback station subtracted from the corresponding temperature

at the index station during the period in which temperature data is unavailable.

• Missing data day denotes any day during the calculation period for which data for the

index station is unavailable.

• Missing data calculation day means the same day and same month as the relevant

missing data day.

• Leap year exception stipulates that, in the event that the missing data day is 29 February,

the missing data calculation day shall be considered as 1 March.

13.3.2.2 Data missing for less than 12 consecutive days

If temperature data is unavailable for less than 12 consecutive days during the calculation

period, the “adjustment” will be computed for each “missing data day” using a period of

the 15 days immediately prior to, and the 15 days immediately following, the relevant

“missing data day”. If there are unavailable temperature data at the fallback station and/or

the index station within the 15-day periods prior to and following a “missing data day”,

then an adjustment period of the ¬rst 15 days on which the relevant temperature values

are available (out of a maximum of 25 days) on either side of each “missing data day”

shall be used to calculate the “adjustment”. To illustrate this process consider the case

of one missing observation, say Y0 . Let Z be the temperature on the fallback station and

Y the temperature being measured. Then the gap between Y and Z is measured over

the period t’1 to t’15 and t1 to t15 and the average gap is calculated over this period

and designated as “avg”. Thus the interpolated value is Y0 = Z0 + avg. Similar methods

would apply to the computation of missing data for longer periods.

13.3.2.3 Data missing for 12 or more consecutive days

If there are missing temperature observations for more than 12 consecutive days, the

“adjustment” for each “missing data day” will be computed using temperature values

from a period of the 15 days prior to (including the relevant “missing data calculation

day”) and the 15 days after the applicable “missing data calculation day” for the three

prior years. If there is no data for the index station and/or the fallback station in the

immediate three previous years, the adjustment period will be extended until three years

of temperature values can be obtained. In the event that there are unavailable temperatures

at the fallback station and/or the index station within the periods of the 15 days prior

to a “missing data calculation day” or the 15 following days, then an adjustment period

shall be the ¬rst 15 days on which the relevant temperature data are available (out of a

maximum of 25 days) on either side of each “missing data calculation day”.

13.3.3 The expectation maximisation algorithm

The EM algorithm is a common technique for determining maximum-likelihood estimates

for parametric models when data are not entirely available. Developed by Dempster et al.

Filling Analysis for Missing Data 387

(1977), the EM algorithm generated an innovation in the analysis of incomplete data,

making it possible to calculate ef¬cient parameter estimates in a wide variety of statistical

problems, under the assumption that the data are normally distributed.

13.3.3.1 Maximum-likelihood estimation

The principle of maximum likelihood provides a means of choosing an asymptotically

ef¬cient estimator for a parameter or a set of parameters. Consider a random sample of

n observations from a normal distribution with mean µ and variance σ 2 . The probability

density function for each observation is (see e.g. Greene (2000), p. 125):

1 (xi ’ µ)2

1 1

f (xi ) = (2π)’ 2 — (σ 2 )’ 2 — exp ’

σ2

2

Since the observations are independent, their joint density is:

n n

(xi ’ µ)2

1

n n

’

(σ 2 )’ 2

f (x1 , x2 , . . . , xn |µ, σ ) = f (xi ) = (2π) — — exp ’ —

2 2

σ2

2

i=1 i=1

This equation gives the probability of observing this speci¬c sample. We are now inter-

ested in the values of µ and σ 2 that make this sample most probable, in other words, the

maximum-likelihood estimates (MLE) θMLE of µ and σ 2 . Since the log function is mono-

tonically increasing and easier to work with, we usually calculate θMLE by maximising

the natural logarithm of the likelihood function:

n

(xi ’ µ)2

n n 1

ln L(µ, σ |x1 , x2 , . . . , xn ) = ’ — ln(2π) ’ — ln(σ 2 ) ’

2

σ2

2 2 2 i=1

This translates into ¬nding solutions to the following ¬rst-order conditions:

n n

‚ ln L ‚ ln L n

1 1

= 2— (xi ’ µ) = 0 =’ 2 + — (xi ’ µ)2 = 0

‚σ 2 2σ 4

‚µ σ 2σ

i=1 i=1

Finally, the MLE estimates are:

n n

1 1

ˆ =— xi = x n ˆ2 =— (xi ’ x n )2

µML σML

and

n n

i=1 i=1

13.3.3.2 The E-step and M-step of the algorithm

When some observations Yobs of the sample are missing, the MLE estimates θMLE are not

obtainable, because the likelihood function is not de¬ned at the missing data points. In

order to overcome this problem we apply the EM algorithm which exploits the interdepen-

dence between the missing data Ymis and parameters θMLE . The Ymis enclose information

relevant to estimating θMLE , and θMLE in turn helps us to compute probable values of

Ymis . This relation suggests the following scheme for estimating θMLE in the presence of

388 Applied Quantitative Methods for Trading and Investment

Yobs alone: “¬ll in” the missing data Ymis based on an initial estimate of θMLE , re-estimate

θMLE based on Yobs and the ¬lled-in Ymis and iterate until the estimates converge (see

Schafer (1997)). In any missing data case, the distribution of the complete dataset Y can

be factored as:

P (Y |θ ) = P (Yobs |θ ) — P (Ymis |Yobs , θ ) (13.1)

Considering each term in the above equation as a function of θ , we have:

l(θ |Y ) = l(θ |Yobs ) + log P (Ymis |Yobs , θ ) + c (13.2)

where l(θ |Y ) = log P (Y |θ ) indicates the complete data log-likelihood, l(θ |Yobs ) =

log L(θ |Yobs ) the observed data log-likelihood, and c a random constant. Averaging

equation (13.2) over the distribution P (Ymis |Yobs , θ (t) ), where θ (t) is an initial estimate

of the unknown parameter, we get:

Q(θ |θ (t) ) = l(θ |Yobs ) + H (θ |θ (t) ) + c (13.3)

where

Q(θ |θ (t) ) = (l(θ |Y ) — P (Ymis |Yobs , θ (t) )) dYmis

and

H (θ |θ (t) ) = (log P (Ymis |Yobs , θ ) — P (Ymis |Yobs , θ (t) )) dYmis

A basic conclusion of Dempster et al. (1977) is that if we consider θ (t+1) as the value of

θ that maximises Q(θ |θ (t) ), then θ (t+1) is an improved estimation compared to θ (t) :

l(θ (t+1) |Yobs ) ≥ l(θ (t) |Yobs )

In summary, it is convenient to think of the EM as an iterative algorithm that operates in

two steps as follows:

• E-step: The expectation step, in which the function Q(θ |θ (t) ) is calculated by averaging

the complete data log-likelihood l(θ |Y ) over P (Ymis |Yobs , θ (t) ).

• M-step: The maximisation step, in which θ (t+1) is calculated by maximising Q(θ |θ (t) ).5

13.3.4 The data augmentation algorithm

Data augmentation is an iterative simulation algorithm, a special kind of Markov chain

Monte Carlo (MCMC). As underlined by Schafer (1997), DA is very similar to the EM

algorithm, and may be regarded as a stochastic edition of EM. In many cases with missing

values, the observed data distribution P (θ |Yobs ) is dif¬cult to de¬ne. However, if Yobs is

“augmented” by a preliminary value of Ymis , the complete data distribution P (θ |Yobs , Ymis )

can be handled.

5

The EM algorithm was implemented with the SPSS software.

Filling Analysis for Missing Data 389

For an initial guess θ (t) of the parameter, we select a value of the missing data from

the conditional distribution of Ymis :

(t+1)

Ymis ∼ P (Ymis |Yobs , θ (t) ) (13.4)

(t+1)

Conditioning on Ymis , we select a new value of θ from its complete data distribution:

(t+1)

θ (t+1) ∼ P (θ |Yobs , Ymis ) (13.5)

Alternating (13.4) and (13.5) from an initial value θ (0) , we obtain a stochastic sequence

(t)

{(θ (t) ,Ymis ): t = 1, 2, . . .} that converges to a stationary distribution P (θ ,Ymis |Yobs ), the

joint distribution of the missing data and parameters given the observed data, and the

(t)

subsequences {θ (t) : t = 1, 2, . . .} and {Ymis : t = 1, 2, . . .} with P (θ |Yobs ) and P (Ymis |Yobs )

their stationary distributions respectively. For a large value of t we can consider θ (t) as

(t)

an approximate draw from P (θ |Yobs ); alternatively, we can regard Ymis as an approximate

draw from P (Ymis |Yobs ).

In summary, DA is an iterative algorithm that operates in two steps as follows:

• I-step: The imputation step, in which the missing data are imputed by drawing them

from their conditional distribution given the observed data and assumed initial values

for the parameters θ (t) .

• P-step: The posterior step, in which the values for the parameters are simulated by

drawing them from their complete data distribution given the most recently imputed

(t+1)

values Ymis for the missing data.

The convergence performance of the DA algorithm depends on the amount of missing

information (how much information about the parameters is contained in the missing

part of the data relative to the observed part). High rates of missing information cause

successive iterations to be highly correlated, and a large number of cycles will be needed

for the algorithm to converge. Low rates of missing information produce low correlation

and rapid convergence.6

The EM and DA algorithms provide optimal solutions under the assumption that the

data are normally distributed. However, weather temperature data deviate from normality.

Nevertheless, in many cases the normal model is useful even when the actual data are non-

normal (see Schafer (1997), p. 147). The EM algorithm is used to calculate the missing

values on the level of the series. Additionally, it is almost always better to run the

EM algorithm before using the DA to impute missing data because running the EM

¬rst will provide good starting values for the DA and will help to predict its likely

convergence behaviour.

13.3.5 The Kalman ¬lter models

The seminal works of Harvey (1989) and Hamilton (1994) have underlined the advan-

tages of using state space modelling for representing dynamic systems where unobserved

6

The DA algorithm was implemented using Schafer™s stand alone NORM software, which can be downloaded

at http://www.stat.psu.edu/∼jls/misoftwa.html.

390 Applied Quantitative Methods for Trading and Investment

variables (the so-called “state” variables) can be integrated within an “observable” model.

The advantage of handling problems of missing observations with the state space form

is that the missing values can be estimated by a smoothing algorithm like the Kalman

¬lter. The ¬ltering is used to estimate the expected value of the state vector according

to the information available at time t, while the intention of smoothing is to include the

information made available after time t.

Harvey (1981) has shown that, if the observations are normally distributed, and the

present estimator of the state vector is the most accurate, the predictor and the updated

estimator will also be the most accurate. In the absence of the normality assumption a

similar result holds, but only within the class of estimators and predictors which are linear

in the observations.

Several chapters in this book have documented in detail state space models and the

Kalman ¬lter so we will avoid giving a new presentation.7 Let it suf¬ce to say that, based

on the principle of parsimony, our initial Kalman ¬lter model was an AR(1) process with a

constant mean, implying that extreme temperatures would show some persistence, but they

would eventually return to their mean level for the period under review. Nevertheless,

comparing the imputation accuracy of alternative models that we tried, we concluded

that the best model for the Tavg series of the entire dataset is an ARMA(1,2) process,

while for the “Autumn” dataset it is an ARMA(2,1) and for the “November” dataset an

ARMA(1,1).8

13.3.6 The neural networks regression models

Here again, two chapters in this book have documented the use of NNR models for

prediction purposes.9 In the circumstances, let us just say that the starting point for the

NNR models was the linear correlations between the Tmax and Tmin of the index station

considered and the explanatory temperature series. While NNR models endeavour to

identify nonlinearities, linear correlation analysis can give an indication of which variables

should be included. Variable selection was achieved using a backward stepwise neural

regression procedure: starting with lagged historical values of the dependent variable

and observations for correlated weather stations, we progressively reduced the number of

inputs, keeping the network architecture constant. If omitting a variable did not deteriorate

the level of explained variance over the previous best model, the pool of explanatory

variables was updated by getting rid of this input (see also Dunis and Jalilov (2002)).

The chosen model was then kept for further tests and improvements.

7

See Bentz (2003), Billio and Sartore (2003) and Cassola and Lu´s (2003).

±

8

The Kalman ¬lter models were implemented with EViews 4.0. The model for the entire dataset is an

ARMA(1,2) process giving a system of four equations:

@signal allavg = c(1) + sv1 + c(2)— sv2 + c(3)— sv3

as the observation equation, plus the three state equations:

@state sv1 = c(5)— sv1(’1) + [var = exp(c(4))]

@state sv2 = sv1(’1)

@state sv3 = sv2(’1)

9

See Dunis and Williams (2003) and Dunis and Huang (2003).

Filling Analysis for Missing Data 391

Finally, each of the datasets was partitioned into three subsets, using approximately 2/3

of the data for training the model, 1/6 for testing and the remaining 1/6 for validation.

The intentions of this partition are the control of the error and the reduction of the risk of

over¬tting. For instance, the ¬nal models for the “November” Tmax and Tmin series have

one hidden layer with ¬ve hidden nodes.10

13.3.7 Principal component analysis

PCA is a standard method for extracting the most signi¬cant uncorrelated sources of

variation in a multivariate system. The objective of PCA is to reduce dimensionality,

so that only the most signi¬cant sources of information are used. This approach is very

useful in highly correlated systems, like weather temperatures from neighbouring stations,

because there will be a small number of independent sources of variation and most of them

can be described by just a few principal components. PCA has numerous applications in

¬nancial markets modelling: the main ones concern factor modelling within the Arbitrage

Pricing Theory framework (see Campbell et al. (1997)), robust regression analysis in the

presence of multicollinearity (again a problem often affecting factor modelling) and yield

curve modelling (see Alexander (2001)), although, in this latter case, other techniques

seem preferable.11 It may also be used to overcome the problem of missing data, as we

shall see next.

13.3.7.1 Principal components computation12

Assume that the data for which the PCA is to be carried out consist of M variables indexed

j = 1, 2, . . . , M and N observations on each variable, i = 1, 2, . . . , N , generating an

N — M matrix X. The input data must be stationary. Furthermore, these stationary data

will need to be normalised before the analysis, otherwise the ¬rst principal component

will be dominated by the input variable with the greatest volatility. Thus, we also assume

that each of the M columns of the stationary data matrix X has mean µ = 0 and variance

σ 2 = 1. This can be achieved by subtracting the sample mean and dividing by the sample

standard deviation for each element xij of matrix X. Consequently we have created a

matrix X of standardised mean deviations. We will transform this matrix to a new set of

random variables, which are pairwise uncorrelated. Let z1 be the new variable with the

maximum variance, then the ¬rst column vector ±1 of M elements is de¬ned as:

= — ±1

z1 X

(13.6)

(N — 1) = (N — M) — (M — 1)

The new variable z1 (N — 1) is a linear combination of the elements in vector ±1 . The

product zT — z1 is the sum of the squares of z1 elements.13

1

Substituting equation (13.6) for z1 we have:

zT — z1 = (X — ±1 )T — (X — ±1 ) = ±1 — (XT — X) — ±1

T

(13.7)

1

10

The NNR models were implemented using the PREVIA software.

11

See Cassola and Lu´s (2003).

±

12

Appendix A gives a brief technical reminder on eigenvectors, eigenvalues and PCA.

13

T indicates matrix transposition.

392 Applied Quantitative Methods for Trading and Investment

The unbiased covariance matrix of the data which produced matrix X is:

1

— XT — X

N ’1

where XT — X is (N ’ 1) times the covariance matrix. Consequently, the maximum vari-

ance is found by selecting as vector ±1 the one that maximises ±1 — (XT — X) — ±1 .

T

In addition vector ±1 is normalised by the condition that its length equals 1, that is

±1 — ±1 = 1. In summary, the conditions that determine z1 are:

T

• max zT — z1

1

• ±1 — ±1 = 1 (vector ±1 normalised)

T

This optimisation problem can be solved analytically with the use of the Lagrange mul-

tiplier (»1 ) and vector differentiation. But it can also be solved with Excel Solver.

The second vector z2 = X — ±2 is determined by the following conditions:

• max zT — z2

2

• ±2 — ±2 = 1 (vector ±2 normalised)

T

• ±2 — ±1 = 0

T

(z2 uncorrelated with z1 )

The last constraint derives from

Cov(z1 , z2 ) = 0 ’ ±1 — (XT — X) — ±2 = 0 ’ (XT — X — ±2 )T — ±1 = 0

T

’ »2 — ±2 — ±1 = 0 ’ ±2 — ±1 = 0 (»2 = 0)

T T

In highly correlated systems, such as weather temperatures, the ¬rst two eigenvalues

explain more than 90% of the total variance. There is therefore no need to retain more

principal components.

13.3.7.2 PCA and missing temperature data

Suppose that the weather station with missing observations is St 1 , and assume that

St 2 , . . . , St 5 are the correlated weather stations. In order to use PCA to ¬ll in the missing

observations we execute the following steps.

• Step 1 : Perform a PCA on St 1 and St 2 , . . . , St 5 using all the available data on St 1

to obtain principal components and factor weights (w11 , w12 , w13 , w14 , w15 ). The

selection of the principal components will depend on how highly correlated the system

is. However, they should always be less than the number of variables (<5).

• Step 2 : Perform one more PCA on St 2 , . . . , St 5 using all the available data on these

variables and the same number of principal components (P1 , . . . , P4 ).

• Step 3 : Using the factor weights from step 1 and the principal components from step 2

we rebuild a simulated data history on St 1 according to the equation:

St — = w11 P1 + · · · + w14 P4

1

• Step 4 : The ¬nal step is the calibration of our model. The actual data on St 1 that are

available should be compared with the ones obtained from step 3 to choose the appro-

priate mean and standard deviation to “reconstruct” the missing weather station data.14

14

Appendix B documents the Excel ¬le “WD PCA.xls” of the accompanying CD-Rom, which shows the

computation of 10 days of missing Tmax temperature data.

Filling Analysis for Missing Data 393

The accuracy of the imputation will depend on the strength of the correlation within

the system. As weather temperature data are highly correlated, PCA is intuitively an

appropriate methodology for ¬lling missing observations.

For our weather observations, all the datasets are stationary, so PCA can be performed.

The correlations between temperature data from neighbouring stations are extremely high,

so the proportion of total variation that can be explained by only two eigenvalues is

about 96%. For example, the linear model with just two principal components for the

“November” Tmax is:

Xmax = 0.452 — P1 + 0.1435 — P2

The values obtained by the above equation are standardised and therefore need to be

multiplied by their standard deviation and added to their mean to transform them back

into temperatures. We obtain the values for the “November” Tmin data in the same way

and calculate the consequent Tavg temperatures accordingly.

13.4 EMPIRICAL RESULTS

13.4.1 The imputation accuracy measures

We use ¬ve standard error statistics to gauge the imputation accuracy of the alternative

¬lling methods retained. These measures are documented in Table 13.1 (for more details,

see for instance Makridakis et al. (1983)).

Table 13.1 Statistical accuracy measures

Accuracy measure Formula

T

1

MAE = — |y t ’ y t |

˜

Mean absolute error (MAE)

T t=1

T

yt ’ yt

˜

100

MAPE = —

Mean absolute percentage error (MAPE)

T yt

t=1

T

1

RMSE = — (yt ’ yt )2

˜

Root mean squared error (RMSE)

T t=1

T

1

— (yt ’ yt )2

˜

T t=1

U=

Theil™s inequality coef¬cient (Theil-U)

T T

1 1

— ˜ + —

2

(yt )2

(yt )

T T

t=1 t=1

T

1

ME = — (yt ’ yt )

˜

Mean Error (ME)

T t=1

394 Applied Quantitative Methods for Trading and Investment

13.4.2 The imputation results

In order to conserve space, only the imputed values of the average temperature dataset

(Tavg ) are presented and compared. It is worth noting however that, throughout the calcu-

lation procedures and for most of the techniques, the imputations of Tavg missing values

obtained as the average of the Tmax and Tmin imputations for the same missing values

are more accurate than those obtained by employing the methodologies directly with

the Tavg series. Daily temperatures are reported in degrees Fahrenheit. The comparison

of the imputation accuracy results for the “November” Tavg series of the Philadelphia

International index station (724080) is presented in Table 13.2.

These results show that, on all the criteria adopted, PCA substantially outperformed the

other methodologies in all cases of missing observations. This means that PCA may be

used to avoid mispricing of weather derivatives and to reduce the error in the settlement

process. The EM, the Kalman ¬lter and the NNR methods outperform the na¨ve approach,

±

but are less accurate than the fallback method. In fact, the fallback methodology appears as

the second most accurate method. Finally, the worst methodology in terms of imputation

accuracy appears to be the DA, whereas the na¨ve approach is validated as a dif¬cult, and

±

Table 13.2 Imputation accuracy measures for the “November” dataset

Performance Measure Na¨ve

± Fallback PCA Kalman NNR EM DA

One Missing Month Tavg (Average Performance Measures)

Mean Absolute Error: 8.02 2.45 1.13 6.13 2.63 6.22 8.48

Root Mean Squared Error: 11.30 2.96 1.37 7.90 2.97 7.90 11.58

Theil™s Inequality Coef¬cient: 0.12 0.03 0.01 0.08 0.03 0.08 0.12

Mean Absolute Percentage Error: 15.93% 4.99% 2.35% 12.04% 5.49% 12.24% 17.50%

Mean Error: ’0.98 ’1.68 ’0.03 ’2.73 ’2.57 ’2.45 ’1.88

20 Missing Days Tavg (Average Performance Measures)

Mean Absolute Error: 8.13 2.63 1.00 6.13 2.85 6.25 7.45

Root Mean Squared Error: 11.81 3.22 1.27 8.49 3.18 8.49 11.01

Theil™s Inequality Coef¬cient: 0.12 0.03 0.01 0.09 0.03 0.09 0.11

Mean Absolute Percentage Error: 14.73% 5.14% 1.99% 10.99% 5.71% 11.29% 13.74%

Mean Error: ’4.53 ’2.03 ’0.25 ’4.58 ’2.75 ’4.15 ’4.00

10 Missing Days Tavg (Average Performance Measures)

Mean Absolute Error: 4.55 1.95 0.65 3.20 2.60 3.45 5.30

Root Mean Squared Error: 5.69 2.55 0.88 4.27 2.90 4.26 6.74

Theil™s Inequality Coef¬cient: 0.06 0.03 0.01 0.05 0.03 0.05 0.07

Mean Absolute Percentage Error: 9.85% 4.08% 1.41% 6.68% 5.73% 7.28% 11.56%

’0.70 ’2.40

Mean Error: 0.75 1.55 0.15 0.15 0.50

One Missing Week Tavg (Average Performance Measures)

Mean Absolute Error: 5.14 2.57 0.71 3.36 2.71 3.71 4.79

Root Mean Squared Error: 6.36 3.02 1.00 4.59 3.09 4.58 6.07

Theil™s Inequality Coef¬cient: 0.07 0.03 0.01 0.05 0.03 0.05 0.07

Mean Absolute Percentage Error: 10.93% 5.31% 1.53% 6.69% 5.90% 7.54% 9.94%

’2.07 ’2.43 ’0.86 ’2.07

Mean Error: 2.43 2.29 0.29

One Missing Day (Tavg )

Mean Absolute Error: 3.00 1.00 0.50 1.50 3.50 3.00 7.50

Root Mean Squared Error: 3.00 1.00 0.50 1.50 3.50 3.00 7.50

Theil™s Inequality Coef¬cient: 0.03 0.01 0.01 0.02 0.04 0.03 0.08

Mean Absolute Percentage Error: 6.90% 2.27% 1.15% 3.45% 8.05% 6.90% 17.24%

’1.50 ’3.05

Mean Error: 3.00 1.00 0.50 3.00 7.50

Filling Analysis for Missing Data 395

Table 13.3 Deviation of computed from actual HDDs

Missing Days Actual Fallback PCA Kalman F. EM DA Naive NNR

HDDs HDDs HDDs HDDs HDDs HDDs HDDs HDDs

A Missing Day (11/01/93) 21.50 20.00 21.50 23.00 18.50 14.00 18.50 20.00

A Missing Week (11/01/93“11/07/93) 123.50 107.50 122.00 138.00 129.50 138.00 106.50 131.50

10 Missing Days (11/01/93“11/10/93) 186.50 171.00 185.50 193.50 185.00 181.50 179.00 188.50

20 Missing Days (11/01/93“11/20/93) 293.00 331.00 297.50 378.50 370.00 367.00 377.50 359.50

A Missing Month (11/01/93“11/30/93) 487.50 535.50 488.00 563.50 555.00 538.00 511.00 548.00

Deviation from actual HDDs for a Missing Day: 1.5 0 1.5 3 7.5 3 1.5

Deviation from actual HDDs for a Missing Week: 16 1.5 14.5 6 14.5 17 8

Deviation from actual HDDs for 10 Missing Days: 15.5 1 7 1.5 5 7.5 2

Deviation from actual HDDs for 20 Missing Days: 38 4.5 85.5 77 74 84.5 66.5

Deviation form actual HDDs for a Missing Month: 48 0.5 76 67.5 50.5 23.5 60.5

easy to “compute”, benchmark to beat.15 For the two other datasets, the “Entire” dataset

and the “Autumn” temperature observations, the results also show that PCA outperformed

all the other methodologies in all the cases of missing data.16

Furthermore, using the imputed values of the missing temperature data obtained from

these techniques, the HDDs for November 1993 have been calculated and compared with

actual HDDs. Table 13.3 shows the deviation between computed and actual HDDs. The

baseline temperature for the computation of the HDDs is 65—¦ F.

It is again obvious that PCA outperforms all the other methodologies. The deviation of

the PCA results from actual HDDs for a missing day is 0, while for 10 missing days it is

1 degree (against 15.5 degrees for the fallback method); for one missing month, it is 0.5

degree, against 48 degrees for the fallback method! On average, the deviation of PCA-

based results from actual HDD data is 1.5 degree, while the second best methodology,

the fallback method, deviates on average by 23.8 degrees.

13.5 CONCLUDING REMARKS

In this chapter, we set out to examine and analyse the use of alternative methods when

confronted with missing data, a common problem when not enough historical data or

clean historical data exist.

Because of the need for “clean” weather temperature data in the fast growing weather

derivatives markets as pricing methodologies rely heavily on them, we examined the

imputation accuracy of different interpolation techniques and ¬lling methods for missing

historical temperature records. We would argue that the conclusions based on this analysis

would serve as a good guide to the general problem of dealing with missing values.

Overall, for the periods and the data series concerned, the results of PCA outperformed

all the other methodologies in all cases of missing observations and consequently in the

calculation of HDDs. The only drawback of PCA compared with the second most accurate

method, the fallback method, is that PCA requires more correlated weather temperature

“clean” data. Nevertheless, if the necessary data are available, PCA should be preferred

in replacing missing temperature observations. More generally, PCA provides a very

ef¬cient and simple method for ¬lling missing data in the presence of a correlated system

of variables. As has been shown, it is also easy to implement, as this can be done in Excel.

15

The poor performance of the EM and DA methods may be linked to the violation of the assumption that the

data are normally distributed (see Section 13.3.4 above), which is not the case here.

16

Complete results are available from the authors.

396 Applied Quantitative Methods for Trading and Investment

APPENDIX A

A.1 Eigenvalue and eigenvector

Let A be a linear transformation symbolised by a matrix A. If there is a vector X ∈ =0

n

such that:

A—X=»—X

for some scalar », then » is called the eigenvalue of A with corresponding (right) eigen-

vector X. Eigenvalues are also known as characteristic roots, proper values or latent

roots (see Marcus and Minc (1988)).

Eigenvalues are given by the solutions of the characteristic equation of the given matrix:

(A ’ » — I) — X = 0

where I is the identity matrix.

A.2 An introduction to PCA

Assume that the data for which the PCA is to be carried out consist of M variables indexed

j = 1, 2, . . . , M and N observations on each variable, i = 1, 2, . . . , N , generating an

N — M matrix X. As mentioned in Section 13.3.7.1, we also assume that the input data

are stationary and that each of the M columns of the stationary data matrix X has mean

µ = 0 and variance σ 2 = 1, which can be achieved by subtracting the sample mean and

dividing by the sample standard deviation for each element xij of matrix X. Consequently

we have created a matrix X of standardised mean deviations.

PCA is founded on an eigenvalue and eigenvector analysis of:

V = X — X/N

the M — M symmetric matrix of correlations among the M variables. The principal

components are linear combinations of these columns, where the weights are selected

according to the following technique:

• The ¬rst principal component describes the majority of the total variation in matrix X,

the second component describes the majority of the remaining variation, and so on.

• The principal components are uncorrelated with each other.

This can be attained by selecting the weights from the cluster of eigenvectors of the

correlation matrix. If W is the M — M matrix of eigenvectors of V, we have:

V—W=W—

is the M — M diagonal matrix of eigenvalues of V. If matrix W = (wij ) for

where

i, j = 1, 2, . . . , M, then the kth column of W, indicated wk = (w1k , w2k , . . . , wMk ) , is

the M — 1 eigenvector corresponding to the eigenvalue »k . The kth principal component

of the system is given by:

Pk = w1k — X1 + w2k — X2 + · · · + wMk — XM

Filling Analysis for Missing Data 397

where Xj indicates the j th column of matrix X, in other words the standardised historical

input data of the j th variable in the initial correlated system. In matrix notation, the kth

principal component of the system is given by:

Pk = X — wk

APPENDIX B

The ¬le WD PCA.xls on the accompanying CD-Rom is an illustration of the PCA method.

In the “Data” sheet of the workbook, and for the sake of simplicity, we retain 122

temperature records in degrees Celsius from 01/09/93 to 31/12/93 for ¬ve weather stations.

A 10-day period of missing data is arti¬cially created between 01/11/93 and 10/11/93.

The common dataset for step 1 of the PCA goes until 31/10/93.

In the “Step 1” sheet of the workbook, to obtain principal components and factor

weights, we perform a PCA on St 1 , St 2 , St 3 , St 4 and St 5 using all the data available

on St 1 up to the missing data period (we have therefore M = 5 and N = 61). We ¬rst

standardise the common dataset up to 31/10/93 (cells H3:L63) by subtracting the sample

mean and dividing by the sample standard deviation for each temperature record.

To obtain the factor weights, we create two column vectors a1 and a2 with ¬ve cells,

one per weather station (initially for a1 , we set the top cell O5 = 1 and cells O6:O9 = 0).

We proceed to compute the ¬rst eigenvector using Excel matrix algebra (cells S4:S64 and

cell P14) and the Excel Solver: in the Excel Solver window (Figure B13.1), we maximise

cell P14 by changing cells O5:O9 subject to cell P15 = 1.

Figure B13.1 Excel Solver window for the second principal component

398 Applied Quantitative Methods for Trading and Investment

We compute the second eigenvector exactly in the same way: only this time we

maximise cell P18 by changing cells Q5:Q9 subject to cell P19 = 1 and cell P20 = 0.

The total variance explained by the ¬rst two components is computed in cell Q33: at

98.4%, there is obviously no need to look for more principal components. We shall retain

the factor weights of St 1 for further use (cells O5:Q5).

In the “Step 2” sheet of the workbook, we perform a second PCA, this time only on

St 2 , St 3 , St 4 and St 5 but using the entire databank for these variables and the same

two principal components (we have therefore M = 4 and N = 122). We shall retain the

principal components z1 and z2 for further use (cells Q4:Q125 and S4:S125).

In the “Steps 3“4” sheet of the workbook, using the factor weights from step 1 (cells

B7:B8) and the useful part of the principal components from step 2 (cells A13:A73 and

C13:C73), we rebuild a simulated data history of St 1 according to the equation:

St — = a1 z1 + a2 z2

1

where a1 and a2 are the factor weights for St 1 from step 1 and z1 and z2 the principal

components from step 2.

In fact, we obtain a standardised series (cells K2:K62) which must be multiplied by

the original series standard deviation and added to its mean to get the “reconstructed”

temperature data (cells M2:M62).

The choice of the mean and standard deviation is the ¬nal and crucial calibration step

of the model: here, we take the standard deviation from step 1 (cell E8); for the mean (cell

E7), considering both the time dependency of temperature data and the high correlation

between weather stations, we decided to take the average of the mean temperature for

all stations 15 days before and 15 days after the missing data period with the mean

temperature for St 2 , St 3 , St 4 and St 5 during the missing data period.

Finally, the “Chart” sheet of the workbook plots our results: as can be seen, both actual

and PCA-based estimates of the Tavg temperature data for St 1 during the 10-day period

from 01/11/93 to 10/11/93 are very close, a view con¬rmed by the average difference of

0.07—¦ C over that period (see cell O63 in the “Steps 3“4” sheet of the workbook).

REFERENCES

Alexander, C. (2001), Market Models: A Guide to Financial Data Analysis, John Wiley, Chichester.

Bentz, Y. (2003), “Quantitative Equity Investment Management with Time-Varying Factor Sensi-

tivities”, Chapter 7 of this book.

Billio, M. and D. Sartore (2003), “Stochastic Volatility Models: A Survey with Applications to

Option Pricing and Value at Risk”, Chapter 8 of this book.

Campbell, J. Y., A. W. Lo and A. C. MacKinlay (1997), The Econometrics of Financial Markets,

Princeton University Press, Princeton, NJ.

Cassola, N. and J. B. Lu´s (2003), “Modelling the Term Structure of Interest Rates: An Application

±

of Gaussian Af¬ne Models to the German Yield Curve”, Chapter 3 of this book.

Corbally, M. and P. Dang (2001), “Risk Products”, in E. Banks (ed.), Weather Risk Management:

Market, Products and Applications, Palgrave, New York, pp. 105“134.

Dempster, A. P., N. M. Laird and D. B. Rubin (1977), “Maximum Likelihood from Incomplete

Data via the EM Algorithm”, Journal of the Royal Statistical Society, B39, 1, 1“38.

Dunis, C. and X. Huang (2003), “Forecasting and Trading Currency Volatility: An Application of

Recurrent Neural Regression and Model Combination”, Chapter 4 of this book.

Dunis, C. and J. Jalilov (2002), “Neural Network Regression and Alternative Forecasting Tech-

niques for Predicting Financial Variables”, Neural Network World, 2, 113“139.

Filling Analysis for Missing Data 399

Dunis, C. and M. Williams (2003), “Applications of Advanced Regression Analysis for Trading

and Investment”, Chapter 1 of this book.

Greene, W. H. (2000), Econometric Analysis, 4th edition, Prentice Hall, Englewood Cliffs, NJ.

Hamilton, J. (1994), Time Series Analysis, Princeton University Press, Princeton, NJ.

Harvey, A. C. (1981), Time Series Models, Philip Allan, Oxford.

Harvey, A. C. (1989), Forecasting Structural Time Series Models and the Kalman Filter, Cambridge

University Press, Cambridge.

Henderson, R. (2001), “Pricing Weather Risk”, in E. Banks (ed.), Weather Risk Management: Mar-

ket, Products and Applications, Palgrave, New York, pp. 167“199.

Makridakis, S., S. C. Wheelwright and V. E. McGee (1983), Forecasting: Methods and Applica-

tions, John Wiley & Sons, New York.

Marcus, M. and H. Minc (1988), Introduction to Linear Algebra, Dover, New York.

PricewaterhouseCoopers (2002), “The Weather Risk Management Industry: Survey Findings

for April 2001 to March 2002”, Weather Risk Management Association, available at:

http://wrma.cyberserv.com/library/public/¬le346.ppt.

Rasp´ , A. (2001), “Legal and Regulatory Issues”, in E. Banks (ed.), Weather Risk Management:

e

Market, Products and Applications, Palgrave, New York, pp. 224“245.

Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, Series Monographs on Statistics

and Applied Probability, Vol. 72, Chapman and Hall, London.

Index

af¬ne models 73“5 bilogistic sigmoid 138

econometric methodology/applications Black“Scholes formula 243, 275“6, 277, 278

87“106 Breusch“Godfrey Serial Correlation LM Test

estimation results 106“26 18

identi¬cation 84“6 buy signal, EUR/USD 31

AR 130

ARCH 193, 245, 258 C4.5 algorithm 174, 178

ARCH-LM 385 Capital Asset Pricing Model (CAPM) 45

Arbitrage Pricing Theory (APT) 45, 73, 282, changing volatility, models of 244“5

283, 391 Classi¬cation Tree in Excel 172“8, 191

ARIMA 130 decision trees 172“5

ARMA modelling 3, 4, 10, 32, 38 technical implementation 175“8

ARMA (1,1) process 248 cointegrating regression 44

ARMA (p, q) 157 Cointegrating Regression Durbin“Watson

in EUR/USD forecasting in trading models (CRDW) 44

13“17 cointegrating vector 44

ARSV model 267, 275 cointegration

arti¬cial neural networks 166“7 advantages and disadvantages 67

arti¬cial random-walk series 43 application to international equities 54“66

asset pricing background issues 77“8 in a controlled simulation 50“4

at-the-money forward (ATMF) 134, 145 de¬nition 41“2

at-the-money forward volatility 355, 356 modelling 41“2

Augmented Dickey“Fuller (ADF) test statistic time series modelling and 42“5

6, 7, 385 conditional correlations 350

autoregression 240 Consumption Capital Asset Pricing Model

average gain/loss 31, 32 (CCAPM) 73

axis-parallel trees 172 contingent assets, pricing 242“3

contrarian strategies 340

backpropagation networks 25, 26, 37 cooling degree days (CDDs) 383

backpropagation neural networks (BPNNs) correct directional change (CDC) 31, 32, 143,

166 144

˜basis™ 48 prediction 130

Bayes™ formula 196

Bayesian approach to stochastic volatility Data Augmentation (DA) algorithm, missing

259“60 data and 381, 382, 388“9

Bayesian estimator 262 delta rule 166

benchmark models 10“20 de-trending of time series 43

Berndt, Hall, Hall and Hausman (BHHH) diagonal GARCH model 326“7

200 with variance targeting 327“8

Bernoulli model 357 Dickey“Fuller (DF) statistics 44, 53“4,

betaARCH 240 126

Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m

±

™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5

402 Index

difference stationary time series 43 orthogonal 313

Duf¬e“Kan (exponential-af¬ne) af¬ne models scalar, with variance targeting 328“9

of the term structure 78“83 GARCH (1,1) model 4, 144, 350

choice of benchmark model 135

GARCH (1,1) volatility forecasts 135“7

Ef¬cient Method of Moments (EMM) 256,

cf NNR/RNN in trading simulation 147“9

257“8

Garman“Kohlhagen pricing formula 130, 353

EGARCH 258

Gauss code for maximum likelihood for

error-correction model (ECM) 41

variance switching models 208“11

estimation bias in rolling estimation 221

Gaussian kernel regression 131

EUR/USD exchange rate 5“10

Expectation Maximisation (EM) algorithm, Generalised Method of Moments (GMM)

255“6

missing data and 381, 382

estimation of long-memory stochastic

expectation maximisation algorithm 386“8

E-step and M-step 387“8 volatility model 262

estimator 262“3

imputation accuracy 394

geometric Brownian motion 357

maximum-likelihood estimation 387

expectations theory German Yield curve, af¬ne modelling of

71“126

forward rate test 83“4

Gibbs algorithm 259

non-pure version 83

explained variance (EV) 26“7 Gini Index 174

Glosten, Jagannathan and Runkle (GJR) model

explicit factor modelling 66“7

320“2

exponential-af¬ne models of the term structure

78“83 Granger causality test 86, 124

Granger“Ramanathan optimal weighting

exponential smoothing 3

regression-based approach 38, 131,

exponentially weighted moving averages

(EWMA) 202, 203“4, 360“1 142“3

extrapolation, de¬nition 42

Hamilton ¬lter 251, 252, 255, 264, 265, 275,

282

factor sensitivities, de¬nition 215“16

fallback methodology to missing weather data for maximum likelihood estimation of

discrete stochastic volatility models

385“6

253

imputation accuracy 394

feedforward neural network 22 heating degree days (HDDs) 383“4

hedging, implicit, of unknown common risk

feedforward neural network, three-layer 167

factors, cointegration and 45“7

Fisher hypothesis 75

forecasting accuracy Hinton graph 28

Historical Burn Analysis (HBA) 384

out-of-sample accuracy measures 31

historical volatility 132“4, 243

out-of-sample forecasting accuracy results

32“4 homoskedastic model, one-factor 74

hyperbolic tangent 138

out-of-sample trading performance measures

31“2

out-of sample trading performance results implicit factor modelling 66, 67

34“6 implied standard deviation 130

forecasting and trading currency volatility implied volatility 134“5, 243

exchange rate series databank and 132“4 importance sampling 258“9

historical volatility 132“4 impulse-response analysis 86

implied volatility series databank 134“5 “impurity” index 173“4

foreign exchange volatility trading models independent components analysis (ICA) 67

145“9 index arbitrage 47“8

currency volatility trading models 145“7 indirect inference method 256, 257“8

trading simulation results 147“9 in¬‚ation measure 84

volatility trading strategies 145 Information Index 174

forward rate regression 75 interpolation, de¬nition 42

F -test 14

Jarque“Bera statistic 6, 9, 199, 200, 202,

GARCH model 130, 131, 193, 258, 318“20 241, 242, 385

diagonal 326“8 Jensen inequality 81

Index 403

jump models 261 alternative optimisation targets 308“10

jump process 357“8 matrix algebra in Excel with increasing

assets 303“8

Kalman ¬lter 76, 223“36, 251, 255, 275 matrix approach 301“3

simple model 294“300

applied to German yield 87“106

Max Minority 174

imputation accuracy 394

missing data 381, 382, 389“90 maximum drawdown 31

maximum likelihood estimation 195“7, 387

for quasi-maximum likelihood estimation of

maximum likelihood procedure applied to

continuous stochastic volatility models

252“3 German yield 87“106

Kalman gain matrix 22 Mean Absolute Error (MAE) 31, 32, 130,

143, 144, 393

Kalman smoother 269

for continuous stochastic volatility models mean absolute percentage error (MAPE) 29,

254“5 31, 32, 130, 393

mean error (ME) 393

Kim smoother 272

for discrete stochastic volatility models mean squared error (MSE) 216“17

255 method of moments 255“6

Kolmogorov“Smirnov test 199, 200, 202 Metropolis“Hastings steps 259, 260, 261

missing data, ¬lling analysis for 381“98

Monte Carlo simulation 244, 281

latent factor models 84

likelihood ratio (LR) test 14 Monte Carlo variance 258“9

Moving Average Convergence/Divergence

linear cross-correlation analysis 26

(MACD) technical models 32, 34,

Ljung“Box test 199, 202

log-normal stochastic volatility model 245 36, 37

in EUR/USD forecasting and trading models

log-periodogram estimator, semi parametric

10“11

262, 263

logit estimation 10, 17“20 long-term (LMA) 11

short-term (SMA) 11

logit1 model 18, 19“20, 21, 22, 23, 32“6

multilayer feedforward neural networks

long-memory stochastic volatility model 262

(MLNNs) 166

multilayer perceptron 22, 137

market risk, VaR applied to 244

multivariate continuous stochastic volatility

Markov chain 246“7

models 264

ergodic 248

multivariate discrete stochastic volatility

Markov Chain Monte Carlo (MCMC) 256

models 264“5