<<

. 18
( 19)



>>

This resulted from an interest rate and volatility increase, almost one month after the
Central Bank of Denmark repo rate increase by 0.6 percentage points (from 4.1% to
4.7% on 9 June 2000), following the ECB decision to increase the re¬ rate to 4.25%
from 3.75%.
After the referendum, the Central Bank of Denmark increased the repo rate to 5.6%,
from 5.1%. This move was followed by a sharp decrease of the 1-year minimum width
of the perfectly credible band, from 1.9% to 1%. Then, the market seemed to expect that,
notwithstanding the referendum result, the Danish monetary policy would continue to be
anchored to the ECB policy, within the ERM II target zone.
The violation of the full credibility condition contrasts with the evolution of the
EUR/DKK exchange rate, whose spot rate kept moving in a quite narrow band, while the
forward rate was always far from the band ceiling, accomplishing the Svensson (1991)
simple credibility test (Figure 12.23).57
The ¬les used for the estimations illustrated in Figures 12.20“12.22 are “credibility
tests PTE 3m.xls”, “credibility tests DKK 1m.xls” and “credibility tests DKK 1y.xls”
and are also included on the CD-Rom. All these ¬les have a similar structure: the ¬rst three
columns contain the data on the interest rates and the spot exchange rates; the forward
exchange rates are computed in the fourth column, while the bounds of the ERM-EMS are
in the two following columns; the implied (offer) volatilities and the respective call-option
prices are presented next; in the following column the minimum intensity of realignment
is computed, as in equation (12.40); the credibility tests in equations (12.34) and (12.36)
are performed next; ¬nally, the minimum width of the fully credible band in equation
(12.37) is calculated.


7.65
7.60
7.55
7.50
7.45
7.40
7.35
7.30
7.25
7.20
Apr-99




Oct-99




Apr-00
Mar-99




Mar-00
Jan-00
Jun-99




Jun-00
Nov-99
Feb-99




Feb-00
Dec-99
Jan-99




Sep-99




Sep-00
May-99


Jul-99




May-00


Jul-00
Aug-99




Aug-00




spot 1-month forward
1-year forward band ceiling
band floor

Figure 12.23 EUR/DKK Spot and forward rates


57
This test corresponds to assessing whether the forward rate exceeds the target zone limits.
378 Applied Quantitative Methods for Trading and Investment

12.7 CONCLUSIONS
In the OTC currency option market, ¬nancial institutions quote regularly at-the-money
forward volatilities, as well as risk-reversals and strangles, which provide information
respectively on the skewness and kurtosis of the distribution of the underlying asset. The
informational content of these prices is supposedly higher than that from exchange traded
option prices concerning currency options, as in this case the over-the-counter market
involves a signi¬cantly higher transactions volume. Besides, using only this information,
it is possible to estimate the whole RND function, as well as to test the credibility of a
target zone.
Currency option prices also provide relevant information on implied correlations
between exchange rates, that can be necessary for portfolio management or risk
assessments. Compared to the usual conditional correlations, the former offer the
advantage of being forward looking, allowing one to obtain correlation measures from
market data in periods in which regime shifts are expected. Considering the EMU, implied
correlations permit market expectations to be assessed on the degree of convergence
between potential participating currencies.
Three empirical applications of the literature were presented: ¬rstly on the EUR/USD,
followed by the analysis of implied correlations and the credibility of the Portuguese
exchange rate policy, during the transition to the EMU, and of the Danish exchange rate
policy around the euro referendum in September 2000.
As in many other cases found in the literature, the euro depreciation vis-` -vis the
a
US dollar has occurred against the UIP predictions. Simultaneously, market expectations
about the future behaviour of the exchange rate also changed markedly, which is re¬‚ected
by OTC quotes for EUR/USD options. The results obtained con¬rm the general assertion
that the expectations on the euro evolution deteriorated from January 1999 to October
2000, though some improvements happened after the joint exchange rate intervention in
September 2000.
Regarding the credibility of the Portuguese exchange rate policy, the results are consis-
tent with the idea that the probability of the escudo™s participation in the euro area from
the beginning increased persistently over the course of 1997. Additionally, the market
attributed a high degree of credibility to the bilateral central parity announced at the 1“3
May 1998 European summit.
On the eve of the summit, market participants were expecting a perfect correlation
between the movements of the Deutsche Mark and the Portuguese escudo, vis-` -vis thea
US dollar. Furthermore, the central moments of the PTE/DEM exchange rate RND func-
tion converged to the central parity and the uncertainty regarding the future values of
PTE/DEM decreased strongly.
Though the Danish crown and the euro kept almost perfectly correlated during the ¬rst
three quarters of 2000, the credibility of the ERM target zone was challenged by option
market participants before the referendum, mainly in the beginning of July. This change
in market expectations followed the widening of the gap between the key interest rates of
the Danish crown and the euro. After the referendum, though the integration in the euro
was rejected, the credibility of the Danish monetary policy was enhanced, re¬‚ected in the
decrease of the minimum width of the fully credible band.
Portfolio Management and Information 379

REFERENCES
Abken, P. A. (1995), “Using Eurodollar Futures and Options: Gauging the Market™s View of Interest
Rate Movements”, Federal Reserve Bank of Atlanta Economic Review, Mar/Apr, 10“30.
Bahra, B. (1996), “Implied Risk-Neutral Probability Density Functions from Option Prices: Theory
and Application”, Bank of England Economic Bulletin, Aug, 299“309.
Ball, C. and W. N. Torous (1983), “A Simpli¬ed Jump Process for Common Stock Returns”, Jour-
nal of Financial and Quantitative Analysis, 18, March, 53“65.
Ball, C. and W. N. Torous (1985), “On Jumps in Common Stock Prices and Their Impact on Call
Option Pricing”, Journal of Finance, XL(1), March, 155“173.
Bates, D. S. (1991), “The Crash of ™87: Was It Expected? The Evidence from Options Markets”,
Journal of Finance, XLVI(3), July, 1009“1044.
BIS (2001), “The Global OTC Derivatives Market at end-June 2001: Second Part of the Triennial
Central Bank Survey of Foreign Exchange and Derivatives Market Activity”.
Black, F. and M. Scholes (1973), “The Pricing of Options and Corporate Liabilities”, Journal of
Political Economy, 81 (May/Jun), 637“654.
Bliss, R. R. and N. Panigirtzoglou (2000), “Testing the Stability of Implied Probability Density
Functions”, Bank of England Working Paper, No. 114.
Breeden, D. T. and R. H. Litzenberger (1978), “Prices of State-Contingent Claims Implicit in
Option Prices”, Journal of Business, 51(4), October, 621“651.
Butler, C. and H. Davies (1998), “Assessing Market Views on Monetary Policy: The Use of Implied
Risk-Neutral Probability Distributions”, paper presented to the BIS/CEPR Conference “Asset
Prices and Monetary Policy”, January.
Campa, J. M. and P. H. K. Chang (1995), “Testing the Expectations Hypothesis on the Term Struc-
ture of Volatilities in Foreign Exchange Options”, Journal of Finance, 50, 529“547.
Campa, J. M. and P. H. K. Chang (1996), “Arbitrage-based Tests of Target-Zone Credibility: Evi-
dence from ERM Cross-Rate Options”, American Economic Review, 86(4), September.
Campa, J. M., P. H. K. Chang and R. L. Reider (1997), “ERM Bandwidths for EMU and After:
Evidence from Foreign Exchange Options”, Economic Policy, 24, April, 55“89.
Clews, R., N. Panigirtzoglou and J. Proudman (2000), “Recent Developments in Extracting Infor-
mation from Option Markets”, Bank of England Quarterly Bulletin, Feb, 50“60.
Deutsche Bundesbank (1995), “The Information Content of Derivatives for Monetary Policy”,
Deutsche Bundesbank Monthly Report, Nov.
Dunis, C. and P. Lequeux (2001), “The Information Content of Risk-Reversals”, Derivatives Uses,
Trading and Regulation, 2, 98“117.
Garman, M. M. and S. W. Kohlhagen (1983), “Foreign Currency Option Values”, Journal of Inter-
national Money and Finance, 2, December, 231“237.
Loretan, M. and W. B. English (2000), “Evaluating Correlation Breakdowns during Periods of
Market Volatility”, BIS Conference Papers, Vol. 8, Bank for International Settlements, March.
Lu´s, J. B. (2001), “Essays on Extracting Information from Financial Asset Prices”, PhD thesis,
±
University of York.
Malz, A. M. (1996), “Using Option Prices to Estimate Realignment Probabilities in the European
Monetary System: the Case of Sterling“Mark”, Journal of International Money and Finance,
15(5), 717“748.
Malz, A. M. (1997), “Estimating the Probability Distribution of the Future Exchange Rate from
Option Prices”, Journal of Derivatives, Winter, 18“36.
McCauley, R. and W. Melick (1996), “Risk Reversal Risk”, Risk, 9(11), November, 54“57.
Melick, W. R. and C. P. Thomas (1997), “Recovering an Asset™s Implied PDF from Option Prices:
An Application to Crude Oil during the Gulf Crisis”, Journal of Financial and Quantitative
Analysis, 32(1), March, 91“115.
Melick, W. R. and C. P. Thomas (1998), “Con¬dence Intervals and Constant-Maturity Series for
Probability Measures Extracted from Options Prices”, paper presented to the Bank of Canada
Conference “Information in Financial Asset Prices”, May.
380 Applied Quantitative Methods for Trading and Investment
Merton, R. C. (1976), “Option Pricing when Underlying Stock Returns are Discontinuous”, Journal
of Financial Economics, Jan/Mar(3), 125“144.
Nelson, C. R. and A. F. Siegel (1987), “Parsimonious Modelling of Yield Curves”, Journal of
Business, 60(4), 473“489.
Ritchey, R. R. (1990), “Call Option Valuation for Discrete Normal Mixtures”, The Journal of Finan-
cial Research, XIII(4), Winter.
S¨ derlind, P. and L. E. O. Svensson (1997), “New Techniques to Extract Market Expectations from
o
Financial Instruments”, Journal of Monetary Economics, 40, 383“429.
Svensson, L. E. O. (1991), “The Simplest Test of Target Zone Credibility”, International Monetary
Fund Staff Papers, 38, 655“665.
Svensson, L. E. O. (1994), “Estimating and Interpreting Forward Interest Rates: Sweden 1992“4”,
CEPR Discussion Paper Series No. 1051.
Xu, X. and S. J. Taylor (1994), “The Term Structure of Volatility Implied by Foreign Exchange
Options”, Journal of Financial and Quantitative Analysis, 29(1), March.
13
Filling Analysis for Missing Data: An
Application to Weather Risk Management—

CHRISTIAN L. DUNIS AND VASSILIOS KARALIS


ABSTRACT
This chapter examines and analyses the use of alternative methods when confronted with
missing data, a common problem when not enough historical data or clean historical data
exist: this will typically be the case when trying to develop a decision tool either for a
new asset in a given asset class (say a recently issued stock in a given company sector)
or for a new asset class as such (for instance weather derivatives).
Weather derivatives are used to illustrate this analysis because most weather derivatives
pricing methodologies rely heavily on “clean” weather temperature data. The methodol-
ogy adopted is to discard certain observed values and treat them as missing data. We
then examine and analyse the imputation accuracy of different interpolation techniques
and ¬lling methods for missing historical records of temperature data: the Expectation
Maximisation (EM) algorithm, the Data Augmentation (DA) algorithm, the Kalman ¬l-
ter (KF), Neural Networks Regression (NNR) models and, ¬nally, Principal Component
Analysis (PCA).
These methodologies are benchmarked against simpler techniques like the fallback
methodology and a na¨ve approach whereby missing temperature observations are imputed
±
with the same day value of the previous year. Their performance is evaluated against the
actual values of the “missing data” using standard measures of forecasting accuracy widely
used in the economic literature.
Overall, it is found that, for the periods and the data series concerned, the results of
PCA outperformed the other methodologies in all cases of missing observations analysed.


13.1 INTRODUCTION
This entire book has been devoted to answering this apparently simple question: with
the help of different methods, what can we learn about the future from the ¬nancial
data available from the past? In other words, we have taken data availability for granted.


The opinions expressed herein are those of the authors, and not necessarily those of Girobank.


Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m
±
™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5
382 Applied Quantitative Methods for Trading and Investment

But the absence of data, or of clean data, is a rather common problem facing users of
quantitative methods in ¬nancial markets.
Admittedly, in recent years, the increasing availability of powerful computers has
made it ever easier to collect, store and process higher frequency databanks, at a time
when the use of advanced technologies for trading, risk and asset management has
grown exponentially among ¬nancial institutions. Nevertheless, in a multivariate con-
text, taking into account the fact that most ¬nancial markets do not move indepen-
dently from each other, the nonsynchronicity of such high frequency data arrival clearly
poses major practical problems. Similarly, even at daily frequencies, a risk manager
will possibly face a dilemma over which data to retain for the marking-to-market and
Value-at-Risk (VaR) computation of a portfolio containing assets from different mar-
kets when some of these are not trading either because of the time difference or a
public holiday.
Such problems will be exacerbated when not enough historical data or clean historical
data exist: this will typically be the case when trying to develop a decision tool either for
a new asset in a given asset class (say a recently issued stock in a given company sector)
or for a new asset class as such (for instance weather derivatives).
The motivation for this chapter is to consider and examine the problem of missing data.
Weather data are used to illustrate this analysis because most weather derivatives pric-
ing methodologies rely heavily on “clean” weather temperature data. The methodology
adopted is to discard certain observed values and treat them as “missing data”. We then
examine and analyse the imputation accuracy of different interpolation techniques and ¬ll-
ing methods for missing historical records of temperature data. These are the Expectation
Maximisation (EM) algorithm, which is a common technique for determining maximum-
likelihood estimates, the Data Augmentation (DA) algorithm, an iterative simulation very
similar to the EM algorithm and which may be regarded as a stochastic edition of the
EM, the Kalman ¬lter (KF), a smoothing algorithm which can be used to estimate missing
observations once a model has been represented in state space form, Principal Component
Analysis (PCA), an approach adapted to ¬ll in appropriate values for missing observa-
tions from data that are available from correlated variables and, ¬nally, Neural Networks
Regression (NNR) models which have been used here to identify and possibly bene¬t
from nonlinear correlations between temperature series.
These methodologies are benchmarked against simpler and widely used techniques like
the fallback methodology and a na¨ve approach whereby missing temperature observa-
±
tions are imputed with the same day value of the previous year. This benchmarking
is carried out to gauge the potential added value of the different methods retained
in the imputation process. The performance of these techniques as predictors of the
“pseudo missing” values is evaluated using standard measures of forecasting accuracy
widely used in the economic literature, such as mean absolute error, root mean squared
error, etc.
Overall, we conclude that, for the periods and the data series concerned, the results of
PCA outperformed the other methodologies in all cases of missing observations analysed.
The rest of the chapter is organised as follows. Section 13.2 gives a brief overview of
the importance of clean weather data for weather derivatives pricing and discusses the
weather data used in this research. Section 13.3 documents the different ¬lling methods
retained while Section 13.4 presents our empirical results. Finally, Section 13.5 concludes
this chapter.
Filling Analysis for Missing Data 383

13.2 WEATHER DATA AND WEATHER DERIVATIVES
13.2.1 The importance of weather data for weather derivatives pricing
The history of the weather derivatives market dates back to 1996, when electricity
deregulation in the USA caused the power market to begin changing from series of local
monopolies to competitive regional wholesale markets. Energy companies realising the
impact of weather on their operations took control of their weather risk and created a new
market around it. While a number of pioneering trades were made as early as 1996, the
market did not really take off until after September 1999, when the Chicago Mercantile
Exchange (CME) embarked on listing and trading standard futures and options contracts
on US temperature indexes. Since 2000, the weather risk market has grown signi¬cantly
and become progressively more diversi¬ed across industries, even if this market originally
evolved from the energy sector. According to the Second Annual Weather Risk Industry
survey commissioned by the Weather Risk Management Association (WRMA), the total
market size for weather derivatives increased to an impressive $11.5 billion at the end of
2001 (see PricewaterhouseCoopers (2002)).
Most weather derivatives contracts traded were temperature related (over 82%). These
are “heating degree days” (HDDs) and “cooling degree days” (CDDs) contracts. A degree
day is the deviation of the average daily temperature (ADT) from a prede¬ned temperature
de¬ned as K in the following equations. HDDs and CDDs are the most common degree
day measurements. The temperature below K represents the temperature below which
heating devices are expected to turn on (HDDs), and above which air conditioners are
expected to turn on (CDDs). Since HDDs and CDDs are measuring heating and cooling
needs compared to the base temperature, they are calculated according to the follow-
ing equations:

Tmax + Tmin
Daily HDDs = max(0, (K ’ ADT)) = max 0, K ’
2
Tmax + Tmin
Daily CDDs = max(0, (ADT ’ K)) = max 0, ’K
2

where Tmax and Tmin are the maximum and minimum, respectively, recorded temperature
during the day.1 The calculation of daily degree days for longer periods (week, season,
year) is based on straightforward summation:
n
Tmax + Tmin
HDDs = max 0, K ’
2
i
n
Tmax + Tmin
CDDs = ’K
max 0,
2
i

In order to demonstrate how a degree day index may be structured to moderate risk,
suppose that an energy company through the past 10 years™ supply and temperature

In the USA, the standard baseline temperature (K) is 65—¦ F (18.3—¦ C), but the 65—¦ F base temperature can
1

¬‚uctuate from one region to another.
384 Applied Quantitative Methods for Trading and Investment

analysis has determined the expected cumulative number of HDDs during the heating
season. The company is expecting to sell a given amount of energy units according to
this number and sets its budgets to this level. Its analysis also suggests that, for each HDD
below the 10-year average, demand will decrease by a set number of energy units, creating
a marginal loss of $55 000 versus budget. In such a case the company may purchase a
HDD index derivative to achieve protection at a price of $55 000 for each HDD lower
than a given strike (see Corbally and Dang (2001)).
Traditionally, ¬nancial derivatives such as options on equities, bonds, forex or com-
modities are priced using no-arbitrage models such as the Black“Scholes pricing model.
In the case of weather derivatives, the underlying asset is a physical quantity (tempera-
ture, rain, wind or snow) rather than a traded asset. Given that the underlying weather
indexes are not traded, a no-arbitrage model cannot be directly applied to price weather
derivatives.
Another approach is used in the insurance industry, known as Historical Burn Analysis
(HBA). The central assumption of the method is that the historical record of weather
contracts payouts gives a precise illustration of the distribution of potential payouts. As
noted by Henderson (2001), if weather risk is calculated as the payouts standard deviation,
then the price of the contract will be given by the equation:

Pricebid/offer (t) = D(t, T ) — (µ ± ±σ )

where D(t, T ) is the discount factor from contract maturity T to the pricing time t, µ is
the historical average payout, σ is the historical standard deviation of payouts and ± is a
positive number denoting the protection seller™s risk tolerance.
HBA therefore crucially depends on historical temperature data. Collecting historical
data may be somewhat dif¬cult and costly but, even when the data is available, there are
several types of errors like missing data or unreasonable readings. Consequently, the data
must be “cleaned” “ that is, the errors and omissions must be ¬xed “ in order to be used
for pricing and risk management purposes.

13.2.2 The weather databank
For this research, we use daily temperature data (cleaned data) for Philadelphia Inter-
national (WMO: 724080) weather station (the index station) and for its “fallback station”
according to the WRMA, the Allentown-Bethlehem (WMO: 725170).2 This data was
obtained from QuantWeather (www.quantweather.com). Furthermore, to perform PCA
and NNR, we use temperature data for all the neighbouring stations for which cleaned
data are available. In order to create a series of consequent daily temperature observations
for Tmax (maximum daily temperature), Tmin (minimum daily temperature) and the cor-
responding Tavg ((Tmax + Tmin )/2) for all the available stations, the databank spans from
1 July 1960 to 31 December 1993 (i.e. 33 years and 12 237 observations). Moreover, to
examine the impact of seasonality, this databank was divided into two separate subsets.
The ¬rst one includes only the “Autumn” (September, October and November) tempera-
ture observations and the second one only November temperature observations, totalling

2
Weather stations are identi¬ed in several ways. The most common is a six-digit WMO ID assigned by the
World Meteorological Organisation. WMO IDs are the primary means of identifying weather stations outside
the USA, while in the USA the ¬ve-digit Weather Bureau Army Navy identities (WBAN IDs) are widely used.
Filling Analysis for Missing Data 385

3094 and 1020 observations respectively. Standard tests like the serial correlation LM test,
ADF, Phillips“Perron, ARCH-LM and Jarque“Bera tests (not reported here in order to
conserve space) showed that all the series are autocorrelated, stationary, heteroskedastic
and not normally distributed.
In order to assess the imputation accuracy of the methodologies, several “holes” were
created in the datasets. Arbitrarily, it is assumed that all the missing observations occurred
in November 1993. We also supposed that the missing data were missing at random. The
fallback methodology employs different procedures to handle missing data according to
the number of consecutive missing days, with 12 days or more an important breakpoint.3
Therefore, ¬ve different gaps were created including 1, 7, 10 (<12), 20 and 30 (>12)
consecutive missing days. Since HDDs and CDDs are calculated as the deviation of the
average daily temperature from the standard baseline temperature and, again, in order
to conserve space only the imputed values of the average temperature dataset (Tavg ) are
presented and compared later in this chapter.

13.3 ALTERNATIVE FILLING METHODS FOR MISSING DATA
In order to ensure that more complex methodologies do indeed add value in the imputation
process, it is essential to benchmark them against simpler and more widely used methods
like the na¨ve approach and the fallback method. We ¬rst present these benchmarks before
±
turning to more sophisticated methods.

13.3.1 The na¨ve approach
±
The na¨ve hypothesis retained is to impute the missing temperature observation with the
±
same day value of the previous year. This simple model is formed by:

Yt = Y —

where Yt is the missing temperature data at period t and Y — the corresponding imputed
value, i.e. the temperature that prevailed on the same day the previous year.

13.3.2 The fallback methodology
The second part of the WRMA con¬rmation4 establishes the weather industry™s standard
for determining and adjusting weather data. It presents an estimation of temperature
based on a weather index station, which is usually an agreed region designated in the
con¬rmation by an airport location, WBAN or WMO ID. In the event that weather data
is not available for the index station, the WRMA con¬rmation sets forth the interpolation
procedures to be used to determine the missing data under the provision entitled “fallback
methodology” (see Rasp´ (2001), p. 233). The fallback method provides an adjustment
e
based on temperature at a selected alternative weather index station referred to as the
“fallback station”.

3
See Section 13.3.2 below.
4
Standard forms of documents and other con¬rming evidence exchanged between the parties in a weather
contract.
386 Applied Quantitative Methods for Trading and Investment

13.3.2.1 Fallback terminology
It is important at this point to de¬ne some basic terminology of the fallback methodology:

• Adjustment indicates the mean of the arithmetic difference of the daily maximum and/or
minimum temperature (if the unavailable temperature is a daily maximum and/or mini-
mum respectively) at the fallback station subtracted from the corresponding temperature
at the index station during the period in which temperature data is unavailable.
• Missing data day denotes any day during the calculation period for which data for the
index station is unavailable.
• Missing data calculation day means the same day and same month as the relevant
missing data day.
• Leap year exception stipulates that, in the event that the missing data day is 29 February,
the missing data calculation day shall be considered as 1 March.

13.3.2.2 Data missing for less than 12 consecutive days
If temperature data is unavailable for less than 12 consecutive days during the calculation
period, the “adjustment” will be computed for each “missing data day” using a period of
the 15 days immediately prior to, and the 15 days immediately following, the relevant
“missing data day”. If there are unavailable temperature data at the fallback station and/or
the index station within the 15-day periods prior to and following a “missing data day”,
then an adjustment period of the ¬rst 15 days on which the relevant temperature values
are available (out of a maximum of 25 days) on either side of each “missing data day”
shall be used to calculate the “adjustment”. To illustrate this process consider the case
of one missing observation, say Y0 . Let Z be the temperature on the fallback station and
Y the temperature being measured. Then the gap between Y and Z is measured over
the period t’1 to t’15 and t1 to t15 and the average gap is calculated over this period
and designated as “avg”. Thus the interpolated value is Y0 = Z0 + avg. Similar methods
would apply to the computation of missing data for longer periods.

13.3.2.3 Data missing for 12 or more consecutive days
If there are missing temperature observations for more than 12 consecutive days, the
“adjustment” for each “missing data day” will be computed using temperature values
from a period of the 15 days prior to (including the relevant “missing data calculation
day”) and the 15 days after the applicable “missing data calculation day” for the three
prior years. If there is no data for the index station and/or the fallback station in the
immediate three previous years, the adjustment period will be extended until three years
of temperature values can be obtained. In the event that there are unavailable temperatures
at the fallback station and/or the index station within the periods of the 15 days prior
to a “missing data calculation day” or the 15 following days, then an adjustment period
shall be the ¬rst 15 days on which the relevant temperature data are available (out of a
maximum of 25 days) on either side of each “missing data calculation day”.

13.3.3 The expectation maximisation algorithm
The EM algorithm is a common technique for determining maximum-likelihood estimates
for parametric models when data are not entirely available. Developed by Dempster et al.
Filling Analysis for Missing Data 387

(1977), the EM algorithm generated an innovation in the analysis of incomplete data,
making it possible to calculate ef¬cient parameter estimates in a wide variety of statistical
problems, under the assumption that the data are normally distributed.

13.3.3.1 Maximum-likelihood estimation
The principle of maximum likelihood provides a means of choosing an asymptotically
ef¬cient estimator for a parameter or a set of parameters. Consider a random sample of
n observations from a normal distribution with mean µ and variance σ 2 . The probability
density function for each observation is (see e.g. Greene (2000), p. 125):

1 (xi ’ µ)2
1 1
f (xi ) = (2π)’ 2 — (σ 2 )’ 2 — exp ’
σ2
2

Since the observations are independent, their joint density is:
n n
(xi ’ µ)2
1
n n

(σ 2 )’ 2
f (x1 , x2 , . . . , xn |µ, σ ) = f (xi ) = (2π) — — exp ’ —
2 2
σ2
2
i=1 i=1

This equation gives the probability of observing this speci¬c sample. We are now inter-
ested in the values of µ and σ 2 that make this sample most probable, in other words, the
maximum-likelihood estimates (MLE) θMLE of µ and σ 2 . Since the log function is mono-
tonically increasing and easier to work with, we usually calculate θMLE by maximising
the natural logarithm of the likelihood function:
n
(xi ’ µ)2
n n 1
ln L(µ, σ |x1 , x2 , . . . , xn ) = ’ — ln(2π) ’ — ln(σ 2 ) ’
2
σ2
2 2 2 i=1

This translates into ¬nding solutions to the following ¬rst-order conditions:
n n
‚ ln L ‚ ln L n
1 1
= 2— (xi ’ µ) = 0 =’ 2 + — (xi ’ µ)2 = 0
‚σ 2 2σ 4
‚µ σ 2σ
i=1 i=1

Finally, the MLE estimates are:
n n
1 1
ˆ =— xi = x n ˆ2 =— (xi ’ x n )2
µML σML
and
n n
i=1 i=1


13.3.3.2 The E-step and M-step of the algorithm
When some observations Yobs of the sample are missing, the MLE estimates θMLE are not
obtainable, because the likelihood function is not de¬ned at the missing data points. In
order to overcome this problem we apply the EM algorithm which exploits the interdepen-
dence between the missing data Ymis and parameters θMLE . The Ymis enclose information
relevant to estimating θMLE , and θMLE in turn helps us to compute probable values of
Ymis . This relation suggests the following scheme for estimating θMLE in the presence of
388 Applied Quantitative Methods for Trading and Investment

Yobs alone: “¬ll in” the missing data Ymis based on an initial estimate of θMLE , re-estimate
θMLE based on Yobs and the ¬lled-in Ymis and iterate until the estimates converge (see
Schafer (1997)). In any missing data case, the distribution of the complete dataset Y can
be factored as:
P (Y |θ ) = P (Yobs |θ ) — P (Ymis |Yobs , θ ) (13.1)

Considering each term in the above equation as a function of θ , we have:

l(θ |Y ) = l(θ |Yobs ) + log P (Ymis |Yobs , θ ) + c (13.2)

where l(θ |Y ) = log P (Y |θ ) indicates the complete data log-likelihood, l(θ |Yobs ) =
log L(θ |Yobs ) the observed data log-likelihood, and c a random constant. Averaging
equation (13.2) over the distribution P (Ymis |Yobs , θ (t) ), where θ (t) is an initial estimate
of the unknown parameter, we get:

Q(θ |θ (t) ) = l(θ |Yobs ) + H (θ |θ (t) ) + c (13.3)

where
Q(θ |θ (t) ) = (l(θ |Y ) — P (Ymis |Yobs , θ (t) )) dYmis

and
H (θ |θ (t) ) = (log P (Ymis |Yobs , θ ) — P (Ymis |Yobs , θ (t) )) dYmis

A basic conclusion of Dempster et al. (1977) is that if we consider θ (t+1) as the value of
θ that maximises Q(θ |θ (t) ), then θ (t+1) is an improved estimation compared to θ (t) :

l(θ (t+1) |Yobs ) ≥ l(θ (t) |Yobs )

In summary, it is convenient to think of the EM as an iterative algorithm that operates in
two steps as follows:

• E-step: The expectation step, in which the function Q(θ |θ (t) ) is calculated by averaging
the complete data log-likelihood l(θ |Y ) over P (Ymis |Yobs , θ (t) ).
• M-step: The maximisation step, in which θ (t+1) is calculated by maximising Q(θ |θ (t) ).5


13.3.4 The data augmentation algorithm
Data augmentation is an iterative simulation algorithm, a special kind of Markov chain
Monte Carlo (MCMC). As underlined by Schafer (1997), DA is very similar to the EM
algorithm, and may be regarded as a stochastic edition of EM. In many cases with missing
values, the observed data distribution P (θ |Yobs ) is dif¬cult to de¬ne. However, if Yobs is
“augmented” by a preliminary value of Ymis , the complete data distribution P (θ |Yobs , Ymis )
can be handled.

5
The EM algorithm was implemented with the SPSS software.
Filling Analysis for Missing Data 389

For an initial guess θ (t) of the parameter, we select a value of the missing data from
the conditional distribution of Ymis :
(t+1)
Ymis ∼ P (Ymis |Yobs , θ (t) ) (13.4)
(t+1)
Conditioning on Ymis , we select a new value of θ from its complete data distribution:
(t+1)
θ (t+1) ∼ P (θ |Yobs , Ymis ) (13.5)

Alternating (13.4) and (13.5) from an initial value θ (0) , we obtain a stochastic sequence
(t)
{(θ (t) ,Ymis ): t = 1, 2, . . .} that converges to a stationary distribution P (θ ,Ymis |Yobs ), the
joint distribution of the missing data and parameters given the observed data, and the
(t)
subsequences {θ (t) : t = 1, 2, . . .} and {Ymis : t = 1, 2, . . .} with P (θ |Yobs ) and P (Ymis |Yobs )
their stationary distributions respectively. For a large value of t we can consider θ (t) as
(t)
an approximate draw from P (θ |Yobs ); alternatively, we can regard Ymis as an approximate
draw from P (Ymis |Yobs ).
In summary, DA is an iterative algorithm that operates in two steps as follows:

• I-step: The imputation step, in which the missing data are imputed by drawing them
from their conditional distribution given the observed data and assumed initial values
for the parameters θ (t) .
• P-step: The posterior step, in which the values for the parameters are simulated by
drawing them from their complete data distribution given the most recently imputed
(t+1)
values Ymis for the missing data.

The convergence performance of the DA algorithm depends on the amount of missing
information (how much information about the parameters is contained in the missing
part of the data relative to the observed part). High rates of missing information cause
successive iterations to be highly correlated, and a large number of cycles will be needed
for the algorithm to converge. Low rates of missing information produce low correlation
and rapid convergence.6
The EM and DA algorithms provide optimal solutions under the assumption that the
data are normally distributed. However, weather temperature data deviate from normality.
Nevertheless, in many cases the normal model is useful even when the actual data are non-
normal (see Schafer (1997), p. 147). The EM algorithm is used to calculate the missing
values on the level of the series. Additionally, it is almost always better to run the
EM algorithm before using the DA to impute missing data because running the EM
¬rst will provide good starting values for the DA and will help to predict its likely
convergence behaviour.

13.3.5 The Kalman ¬lter models
The seminal works of Harvey (1989) and Hamilton (1994) have underlined the advan-
tages of using state space modelling for representing dynamic systems where unobserved

6
The DA algorithm was implemented using Schafer™s stand alone NORM software, which can be downloaded
at http://www.stat.psu.edu/∼jls/misoftwa.html.
390 Applied Quantitative Methods for Trading and Investment

variables (the so-called “state” variables) can be integrated within an “observable” model.
The advantage of handling problems of missing observations with the state space form
is that the missing values can be estimated by a smoothing algorithm like the Kalman
¬lter. The ¬ltering is used to estimate the expected value of the state vector according
to the information available at time t, while the intention of smoothing is to include the
information made available after time t.
Harvey (1981) has shown that, if the observations are normally distributed, and the
present estimator of the state vector is the most accurate, the predictor and the updated
estimator will also be the most accurate. In the absence of the normality assumption a
similar result holds, but only within the class of estimators and predictors which are linear
in the observations.
Several chapters in this book have documented in detail state space models and the
Kalman ¬lter so we will avoid giving a new presentation.7 Let it suf¬ce to say that, based
on the principle of parsimony, our initial Kalman ¬lter model was an AR(1) process with a
constant mean, implying that extreme temperatures would show some persistence, but they
would eventually return to their mean level for the period under review. Nevertheless,
comparing the imputation accuracy of alternative models that we tried, we concluded
that the best model for the Tavg series of the entire dataset is an ARMA(1,2) process,
while for the “Autumn” dataset it is an ARMA(2,1) and for the “November” dataset an
ARMA(1,1).8

13.3.6 The neural networks regression models
Here again, two chapters in this book have documented the use of NNR models for
prediction purposes.9 In the circumstances, let us just say that the starting point for the
NNR models was the linear correlations between the Tmax and Tmin of the index station
considered and the explanatory temperature series. While NNR models endeavour to
identify nonlinearities, linear correlation analysis can give an indication of which variables
should be included. Variable selection was achieved using a backward stepwise neural
regression procedure: starting with lagged historical values of the dependent variable
and observations for correlated weather stations, we progressively reduced the number of
inputs, keeping the network architecture constant. If omitting a variable did not deteriorate
the level of explained variance over the previous best model, the pool of explanatory
variables was updated by getting rid of this input (see also Dunis and Jalilov (2002)).
The chosen model was then kept for further tests and improvements.

7
See Bentz (2003), Billio and Sartore (2003) and Cassola and Lu´s (2003).
±
8
The Kalman ¬lter models were implemented with EViews 4.0. The model for the entire dataset is an
ARMA(1,2) process giving a system of four equations:

@signal allavg = c(1) + sv1 + c(2)— sv2 + c(3)— sv3

as the observation equation, plus the three state equations:

@state sv1 = c(5)— sv1(’1) + [var = exp(c(4))]
@state sv2 = sv1(’1)
@state sv3 = sv2(’1)
9
See Dunis and Williams (2003) and Dunis and Huang (2003).
Filling Analysis for Missing Data 391

Finally, each of the datasets was partitioned into three subsets, using approximately 2/3
of the data for training the model, 1/6 for testing and the remaining 1/6 for validation.
The intentions of this partition are the control of the error and the reduction of the risk of
over¬tting. For instance, the ¬nal models for the “November” Tmax and Tmin series have
one hidden layer with ¬ve hidden nodes.10

13.3.7 Principal component analysis
PCA is a standard method for extracting the most signi¬cant uncorrelated sources of
variation in a multivariate system. The objective of PCA is to reduce dimensionality,
so that only the most signi¬cant sources of information are used. This approach is very
useful in highly correlated systems, like weather temperatures from neighbouring stations,
because there will be a small number of independent sources of variation and most of them
can be described by just a few principal components. PCA has numerous applications in
¬nancial markets modelling: the main ones concern factor modelling within the Arbitrage
Pricing Theory framework (see Campbell et al. (1997)), robust regression analysis in the
presence of multicollinearity (again a problem often affecting factor modelling) and yield
curve modelling (see Alexander (2001)), although, in this latter case, other techniques
seem preferable.11 It may also be used to overcome the problem of missing data, as we
shall see next.

13.3.7.1 Principal components computation12
Assume that the data for which the PCA is to be carried out consist of M variables indexed
j = 1, 2, . . . , M and N observations on each variable, i = 1, 2, . . . , N , generating an
N — M matrix X. The input data must be stationary. Furthermore, these stationary data
will need to be normalised before the analysis, otherwise the ¬rst principal component
will be dominated by the input variable with the greatest volatility. Thus, we also assume
that each of the M columns of the stationary data matrix X has mean µ = 0 and variance
σ 2 = 1. This can be achieved by subtracting the sample mean and dividing by the sample
standard deviation for each element xij of matrix X. Consequently we have created a
matrix X of standardised mean deviations. We will transform this matrix to a new set of
random variables, which are pairwise uncorrelated. Let z1 be the new variable with the
maximum variance, then the ¬rst column vector ±1 of M elements is de¬ned as:

= — ±1
z1 X
(13.6)
(N — 1) = (N — M) — (M — 1)

The new variable z1 (N — 1) is a linear combination of the elements in vector ±1 . The
product zT — z1 is the sum of the squares of z1 elements.13
1
Substituting equation (13.6) for z1 we have:

zT — z1 = (X — ±1 )T — (X — ±1 ) = ±1 — (XT — X) — ±1
T
(13.7)
1

10
The NNR models were implemented using the PREVIA software.
11
See Cassola and Lu´s (2003).
±
12
Appendix A gives a brief technical reminder on eigenvectors, eigenvalues and PCA.
13
T indicates matrix transposition.
392 Applied Quantitative Methods for Trading and Investment

The unbiased covariance matrix of the data which produced matrix X is:
1
— XT — X
N ’1
where XT — X is (N ’ 1) times the covariance matrix. Consequently, the maximum vari-
ance is found by selecting as vector ±1 the one that maximises ±1 — (XT — X) — ±1 .
T

In addition vector ±1 is normalised by the condition that its length equals 1, that is
±1 — ±1 = 1. In summary, the conditions that determine z1 are:
T


• max zT — z1
1
• ±1 — ±1 = 1 (vector ±1 normalised)
T


This optimisation problem can be solved analytically with the use of the Lagrange mul-
tiplier (»1 ) and vector differentiation. But it can also be solved with Excel Solver.
The second vector z2 = X — ±2 is determined by the following conditions:
• max zT — z2
2
• ±2 — ±2 = 1 (vector ±2 normalised)
T

• ±2 — ±1 = 0
T
(z2 uncorrelated with z1 )
The last constraint derives from
Cov(z1 , z2 ) = 0 ’ ±1 — (XT — X) — ±2 = 0 ’ (XT — X — ±2 )T — ±1 = 0
T


’ »2 — ±2 — ±1 = 0 ’ ±2 — ±1 = 0 (»2 = 0)
T T


In highly correlated systems, such as weather temperatures, the ¬rst two eigenvalues
explain more than 90% of the total variance. There is therefore no need to retain more
principal components.

13.3.7.2 PCA and missing temperature data
Suppose that the weather station with missing observations is St 1 , and assume that
St 2 , . . . , St 5 are the correlated weather stations. In order to use PCA to ¬ll in the missing
observations we execute the following steps.
• Step 1 : Perform a PCA on St 1 and St 2 , . . . , St 5 using all the available data on St 1
to obtain principal components and factor weights (w11 , w12 , w13 , w14 , w15 ). The
selection of the principal components will depend on how highly correlated the system
is. However, they should always be less than the number of variables (<5).
• Step 2 : Perform one more PCA on St 2 , . . . , St 5 using all the available data on these
variables and the same number of principal components (P1 , . . . , P4 ).
• Step 3 : Using the factor weights from step 1 and the principal components from step 2
we rebuild a simulated data history on St 1 according to the equation:
St — = w11 P1 + · · · + w14 P4
1

• Step 4 : The ¬nal step is the calibration of our model. The actual data on St 1 that are
available should be compared with the ones obtained from step 3 to choose the appro-
priate mean and standard deviation to “reconstruct” the missing weather station data.14

14
Appendix B documents the Excel ¬le “WD PCA.xls” of the accompanying CD-Rom, which shows the
computation of 10 days of missing Tmax temperature data.
Filling Analysis for Missing Data 393

The accuracy of the imputation will depend on the strength of the correlation within
the system. As weather temperature data are highly correlated, PCA is intuitively an
appropriate methodology for ¬lling missing observations.
For our weather observations, all the datasets are stationary, so PCA can be performed.
The correlations between temperature data from neighbouring stations are extremely high,
so the proportion of total variation that can be explained by only two eigenvalues is
about 96%. For example, the linear model with just two principal components for the
“November” Tmax is:
Xmax = 0.452 — P1 + 0.1435 — P2

The values obtained by the above equation are standardised and therefore need to be
multiplied by their standard deviation and added to their mean to transform them back
into temperatures. We obtain the values for the “November” Tmin data in the same way
and calculate the consequent Tavg temperatures accordingly.


13.4 EMPIRICAL RESULTS
13.4.1 The imputation accuracy measures

We use ¬ve standard error statistics to gauge the imputation accuracy of the alternative
¬lling methods retained. These measures are documented in Table 13.1 (for more details,
see for instance Makridakis et al. (1983)).


Table 13.1 Statistical accuracy measures

Accuracy measure Formula

T
1
MAE = — |y t ’ y t |
˜
Mean absolute error (MAE)
T t=1
T
yt ’ yt
˜
100
MAPE = —
Mean absolute percentage error (MAPE)
T yt
t=1

T
1
RMSE = — (yt ’ yt )2
˜
Root mean squared error (RMSE)
T t=1

T
1
— (yt ’ yt )2
˜
T t=1
U=
Theil™s inequality coef¬cient (Theil-U)
T T
1 1
— ˜ + —
2
(yt )2
(yt )
T T
t=1 t=1

T
1
ME = — (yt ’ yt )
˜
Mean Error (ME)
T t=1
394 Applied Quantitative Methods for Trading and Investment

13.4.2 The imputation results

In order to conserve space, only the imputed values of the average temperature dataset
(Tavg ) are presented and compared. It is worth noting however that, throughout the calcu-
lation procedures and for most of the techniques, the imputations of Tavg missing values
obtained as the average of the Tmax and Tmin imputations for the same missing values
are more accurate than those obtained by employing the methodologies directly with
the Tavg series. Daily temperatures are reported in degrees Fahrenheit. The comparison
of the imputation accuracy results for the “November” Tavg series of the Philadelphia
International index station (724080) is presented in Table 13.2.
These results show that, on all the criteria adopted, PCA substantially outperformed the
other methodologies in all cases of missing observations. This means that PCA may be
used to avoid mispricing of weather derivatives and to reduce the error in the settlement
process. The EM, the Kalman ¬lter and the NNR methods outperform the na¨ve approach,
±
but are less accurate than the fallback method. In fact, the fallback methodology appears as
the second most accurate method. Finally, the worst methodology in terms of imputation
accuracy appears to be the DA, whereas the na¨ve approach is validated as a dif¬cult, and
±

Table 13.2 Imputation accuracy measures for the “November” dataset

Performance Measure Na¨ve
± Fallback PCA Kalman NNR EM DA

One Missing Month Tavg (Average Performance Measures)
Mean Absolute Error: 8.02 2.45 1.13 6.13 2.63 6.22 8.48
Root Mean Squared Error: 11.30 2.96 1.37 7.90 2.97 7.90 11.58
Theil™s Inequality Coef¬cient: 0.12 0.03 0.01 0.08 0.03 0.08 0.12
Mean Absolute Percentage Error: 15.93% 4.99% 2.35% 12.04% 5.49% 12.24% 17.50%
Mean Error: ’0.98 ’1.68 ’0.03 ’2.73 ’2.57 ’2.45 ’1.88
20 Missing Days Tavg (Average Performance Measures)
Mean Absolute Error: 8.13 2.63 1.00 6.13 2.85 6.25 7.45
Root Mean Squared Error: 11.81 3.22 1.27 8.49 3.18 8.49 11.01
Theil™s Inequality Coef¬cient: 0.12 0.03 0.01 0.09 0.03 0.09 0.11
Mean Absolute Percentage Error: 14.73% 5.14% 1.99% 10.99% 5.71% 11.29% 13.74%
Mean Error: ’4.53 ’2.03 ’0.25 ’4.58 ’2.75 ’4.15 ’4.00
10 Missing Days Tavg (Average Performance Measures)
Mean Absolute Error: 4.55 1.95 0.65 3.20 2.60 3.45 5.30
Root Mean Squared Error: 5.69 2.55 0.88 4.27 2.90 4.26 6.74
Theil™s Inequality Coef¬cient: 0.06 0.03 0.01 0.05 0.03 0.05 0.07
Mean Absolute Percentage Error: 9.85% 4.08% 1.41% 6.68% 5.73% 7.28% 11.56%
’0.70 ’2.40
Mean Error: 0.75 1.55 0.15 0.15 0.50
One Missing Week Tavg (Average Performance Measures)
Mean Absolute Error: 5.14 2.57 0.71 3.36 2.71 3.71 4.79
Root Mean Squared Error: 6.36 3.02 1.00 4.59 3.09 4.58 6.07
Theil™s Inequality Coef¬cient: 0.07 0.03 0.01 0.05 0.03 0.05 0.07
Mean Absolute Percentage Error: 10.93% 5.31% 1.53% 6.69% 5.90% 7.54% 9.94%
’2.07 ’2.43 ’0.86 ’2.07
Mean Error: 2.43 2.29 0.29
One Missing Day (Tavg )
Mean Absolute Error: 3.00 1.00 0.50 1.50 3.50 3.00 7.50
Root Mean Squared Error: 3.00 1.00 0.50 1.50 3.50 3.00 7.50
Theil™s Inequality Coef¬cient: 0.03 0.01 0.01 0.02 0.04 0.03 0.08
Mean Absolute Percentage Error: 6.90% 2.27% 1.15% 3.45% 8.05% 6.90% 17.24%
’1.50 ’3.05
Mean Error: 3.00 1.00 0.50 3.00 7.50
Filling Analysis for Missing Data 395
Table 13.3 Deviation of computed from actual HDDs
Missing Days Actual Fallback PCA Kalman F. EM DA Naive NNR
HDDs HDDs HDDs HDDs HDDs HDDs HDDs HDDs

A Missing Day (11/01/93) 21.50 20.00 21.50 23.00 18.50 14.00 18.50 20.00
A Missing Week (11/01/93“11/07/93) 123.50 107.50 122.00 138.00 129.50 138.00 106.50 131.50
10 Missing Days (11/01/93“11/10/93) 186.50 171.00 185.50 193.50 185.00 181.50 179.00 188.50
20 Missing Days (11/01/93“11/20/93) 293.00 331.00 297.50 378.50 370.00 367.00 377.50 359.50
A Missing Month (11/01/93“11/30/93) 487.50 535.50 488.00 563.50 555.00 538.00 511.00 548.00
Deviation from actual HDDs for a Missing Day: 1.5 0 1.5 3 7.5 3 1.5
Deviation from actual HDDs for a Missing Week: 16 1.5 14.5 6 14.5 17 8
Deviation from actual HDDs for 10 Missing Days: 15.5 1 7 1.5 5 7.5 2
Deviation from actual HDDs for 20 Missing Days: 38 4.5 85.5 77 74 84.5 66.5
Deviation form actual HDDs for a Missing Month: 48 0.5 76 67.5 50.5 23.5 60.5



easy to “compute”, benchmark to beat.15 For the two other datasets, the “Entire” dataset
and the “Autumn” temperature observations, the results also show that PCA outperformed
all the other methodologies in all the cases of missing data.16
Furthermore, using the imputed values of the missing temperature data obtained from
these techniques, the HDDs for November 1993 have been calculated and compared with
actual HDDs. Table 13.3 shows the deviation between computed and actual HDDs. The
baseline temperature for the computation of the HDDs is 65—¦ F.
It is again obvious that PCA outperforms all the other methodologies. The deviation of
the PCA results from actual HDDs for a missing day is 0, while for 10 missing days it is
1 degree (against 15.5 degrees for the fallback method); for one missing month, it is 0.5
degree, against 48 degrees for the fallback method! On average, the deviation of PCA-
based results from actual HDD data is 1.5 degree, while the second best methodology,
the fallback method, deviates on average by 23.8 degrees.

13.5 CONCLUDING REMARKS
In this chapter, we set out to examine and analyse the use of alternative methods when
confronted with missing data, a common problem when not enough historical data or
clean historical data exist.
Because of the need for “clean” weather temperature data in the fast growing weather
derivatives markets as pricing methodologies rely heavily on them, we examined the
imputation accuracy of different interpolation techniques and ¬lling methods for missing
historical temperature records. We would argue that the conclusions based on this analysis
would serve as a good guide to the general problem of dealing with missing values.
Overall, for the periods and the data series concerned, the results of PCA outperformed
all the other methodologies in all cases of missing observations and consequently in the
calculation of HDDs. The only drawback of PCA compared with the second most accurate
method, the fallback method, is that PCA requires more correlated weather temperature
“clean” data. Nevertheless, if the necessary data are available, PCA should be preferred
in replacing missing temperature observations. More generally, PCA provides a very
ef¬cient and simple method for ¬lling missing data in the presence of a correlated system
of variables. As has been shown, it is also easy to implement, as this can be done in Excel.

15
The poor performance of the EM and DA methods may be linked to the violation of the assumption that the
data are normally distributed (see Section 13.3.4 above), which is not the case here.
16
Complete results are available from the authors.
396 Applied Quantitative Methods for Trading and Investment

APPENDIX A
A.1 Eigenvalue and eigenvector
Let A be a linear transformation symbolised by a matrix A. If there is a vector X ∈ =0
n

such that:
A—X=»—X

for some scalar », then » is called the eigenvalue of A with corresponding (right) eigen-
vector X. Eigenvalues are also known as characteristic roots, proper values or latent
roots (see Marcus and Minc (1988)).
Eigenvalues are given by the solutions of the characteristic equation of the given matrix:

(A ’ » — I) — X = 0

where I is the identity matrix.

A.2 An introduction to PCA
Assume that the data for which the PCA is to be carried out consist of M variables indexed
j = 1, 2, . . . , M and N observations on each variable, i = 1, 2, . . . , N , generating an
N — M matrix X. As mentioned in Section 13.3.7.1, we also assume that the input data
are stationary and that each of the M columns of the stationary data matrix X has mean
µ = 0 and variance σ 2 = 1, which can be achieved by subtracting the sample mean and
dividing by the sample standard deviation for each element xij of matrix X. Consequently
we have created a matrix X of standardised mean deviations.
PCA is founded on an eigenvalue and eigenvector analysis of:

V = X — X/N

the M — M symmetric matrix of correlations among the M variables. The principal
components are linear combinations of these columns, where the weights are selected
according to the following technique:

• The ¬rst principal component describes the majority of the total variation in matrix X,
the second component describes the majority of the remaining variation, and so on.
• The principal components are uncorrelated with each other.

This can be attained by selecting the weights from the cluster of eigenvectors of the
correlation matrix. If W is the M — M matrix of eigenvectors of V, we have:

V—W=W—

is the M — M diagonal matrix of eigenvalues of V. If matrix W = (wij ) for
where
i, j = 1, 2, . . . , M, then the kth column of W, indicated wk = (w1k , w2k , . . . , wMk ) , is
the M — 1 eigenvector corresponding to the eigenvalue »k . The kth principal component
of the system is given by:

Pk = w1k — X1 + w2k — X2 + · · · + wMk — XM
Filling Analysis for Missing Data 397

where Xj indicates the j th column of matrix X, in other words the standardised historical
input data of the j th variable in the initial correlated system. In matrix notation, the kth
principal component of the system is given by:

Pk = X — wk

APPENDIX B
The ¬le WD PCA.xls on the accompanying CD-Rom is an illustration of the PCA method.
In the “Data” sheet of the workbook, and for the sake of simplicity, we retain 122
temperature records in degrees Celsius from 01/09/93 to 31/12/93 for ¬ve weather stations.
A 10-day period of missing data is arti¬cially created between 01/11/93 and 10/11/93.
The common dataset for step 1 of the PCA goes until 31/10/93.
In the “Step 1” sheet of the workbook, to obtain principal components and factor
weights, we perform a PCA on St 1 , St 2 , St 3 , St 4 and St 5 using all the data available
on St 1 up to the missing data period (we have therefore M = 5 and N = 61). We ¬rst
standardise the common dataset up to 31/10/93 (cells H3:L63) by subtracting the sample
mean and dividing by the sample standard deviation for each temperature record.
To obtain the factor weights, we create two column vectors a1 and a2 with ¬ve cells,
one per weather station (initially for a1 , we set the top cell O5 = 1 and cells O6:O9 = 0).
We proceed to compute the ¬rst eigenvector using Excel matrix algebra (cells S4:S64 and
cell P14) and the Excel Solver: in the Excel Solver window (Figure B13.1), we maximise
cell P14 by changing cells O5:O9 subject to cell P15 = 1.




Figure B13.1 Excel Solver window for the second principal component
398 Applied Quantitative Methods for Trading and Investment

We compute the second eigenvector exactly in the same way: only this time we
maximise cell P18 by changing cells Q5:Q9 subject to cell P19 = 1 and cell P20 = 0.
The total variance explained by the ¬rst two components is computed in cell Q33: at
98.4%, there is obviously no need to look for more principal components. We shall retain
the factor weights of St 1 for further use (cells O5:Q5).
In the “Step 2” sheet of the workbook, we perform a second PCA, this time only on
St 2 , St 3 , St 4 and St 5 but using the entire databank for these variables and the same
two principal components (we have therefore M = 4 and N = 122). We shall retain the
principal components z1 and z2 for further use (cells Q4:Q125 and S4:S125).
In the “Steps 3“4” sheet of the workbook, using the factor weights from step 1 (cells
B7:B8) and the useful part of the principal components from step 2 (cells A13:A73 and
C13:C73), we rebuild a simulated data history of St 1 according to the equation:

St — = a1 z1 + a2 z2
1


where a1 and a2 are the factor weights for St 1 from step 1 and z1 and z2 the principal
components from step 2.
In fact, we obtain a standardised series (cells K2:K62) which must be multiplied by
the original series standard deviation and added to its mean to get the “reconstructed”
temperature data (cells M2:M62).
The choice of the mean and standard deviation is the ¬nal and crucial calibration step
of the model: here, we take the standard deviation from step 1 (cell E8); for the mean (cell
E7), considering both the time dependency of temperature data and the high correlation
between weather stations, we decided to take the average of the mean temperature for
all stations 15 days before and 15 days after the missing data period with the mean
temperature for St 2 , St 3 , St 4 and St 5 during the missing data period.
Finally, the “Chart” sheet of the workbook plots our results: as can be seen, both actual
and PCA-based estimates of the Tavg temperature data for St 1 during the 10-day period
from 01/11/93 to 10/11/93 are very close, a view con¬rmed by the average difference of
0.07—¦ C over that period (see cell O63 in the “Steps 3“4” sheet of the workbook).

REFERENCES
Alexander, C. (2001), Market Models: A Guide to Financial Data Analysis, John Wiley, Chichester.
Bentz, Y. (2003), “Quantitative Equity Investment Management with Time-Varying Factor Sensi-
tivities”, Chapter 7 of this book.
Billio, M. and D. Sartore (2003), “Stochastic Volatility Models: A Survey with Applications to
Option Pricing and Value at Risk”, Chapter 8 of this book.
Campbell, J. Y., A. W. Lo and A. C. MacKinlay (1997), The Econometrics of Financial Markets,
Princeton University Press, Princeton, NJ.
Cassola, N. and J. B. Lu´s (2003), “Modelling the Term Structure of Interest Rates: An Application
±
of Gaussian Af¬ne Models to the German Yield Curve”, Chapter 3 of this book.
Corbally, M. and P. Dang (2001), “Risk Products”, in E. Banks (ed.), Weather Risk Management:
Market, Products and Applications, Palgrave, New York, pp. 105“134.
Dempster, A. P., N. M. Laird and D. B. Rubin (1977), “Maximum Likelihood from Incomplete
Data via the EM Algorithm”, Journal of the Royal Statistical Society, B39, 1, 1“38.
Dunis, C. and X. Huang (2003), “Forecasting and Trading Currency Volatility: An Application of
Recurrent Neural Regression and Model Combination”, Chapter 4 of this book.
Dunis, C. and J. Jalilov (2002), “Neural Network Regression and Alternative Forecasting Tech-
niques for Predicting Financial Variables”, Neural Network World, 2, 113“139.
Filling Analysis for Missing Data 399
Dunis, C. and M. Williams (2003), “Applications of Advanced Regression Analysis for Trading
and Investment”, Chapter 1 of this book.
Greene, W. H. (2000), Econometric Analysis, 4th edition, Prentice Hall, Englewood Cliffs, NJ.
Hamilton, J. (1994), Time Series Analysis, Princeton University Press, Princeton, NJ.
Harvey, A. C. (1981), Time Series Models, Philip Allan, Oxford.
Harvey, A. C. (1989), Forecasting Structural Time Series Models and the Kalman Filter, Cambridge
University Press, Cambridge.
Henderson, R. (2001), “Pricing Weather Risk”, in E. Banks (ed.), Weather Risk Management: Mar-
ket, Products and Applications, Palgrave, New York, pp. 167“199.
Makridakis, S., S. C. Wheelwright and V. E. McGee (1983), Forecasting: Methods and Applica-
tions, John Wiley & Sons, New York.
Marcus, M. and H. Minc (1988), Introduction to Linear Algebra, Dover, New York.
PricewaterhouseCoopers (2002), “The Weather Risk Management Industry: Survey Findings
for April 2001 to March 2002”, Weather Risk Management Association, available at:
http://wrma.cyberserv.com/library/public/¬le346.ppt.
Rasp´ , A. (2001), “Legal and Regulatory Issues”, in E. Banks (ed.), Weather Risk Management:
e
Market, Products and Applications, Palgrave, New York, pp. 224“245.
Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, Series Monographs on Statistics
and Applied Probability, Vol. 72, Chapman and Hall, London.
Index

af¬ne models 73“5 bilogistic sigmoid 138
econometric methodology/applications Black“Scholes formula 243, 275“6, 277, 278
87“106 Breusch“Godfrey Serial Correlation LM Test
estimation results 106“26 18
identi¬cation 84“6 buy signal, EUR/USD 31
AR 130
ARCH 193, 245, 258 C4.5 algorithm 174, 178
ARCH-LM 385 Capital Asset Pricing Model (CAPM) 45
Arbitrage Pricing Theory (APT) 45, 73, 282, changing volatility, models of 244“5
283, 391 Classi¬cation Tree in Excel 172“8, 191
ARIMA 130 decision trees 172“5
ARMA modelling 3, 4, 10, 32, 38 technical implementation 175“8
ARMA (1,1) process 248 cointegrating regression 44
ARMA (p, q) 157 Cointegrating Regression Durbin“Watson
in EUR/USD forecasting in trading models (CRDW) 44
13“17 cointegrating vector 44
ARSV model 267, 275 cointegration
arti¬cial neural networks 166“7 advantages and disadvantages 67
arti¬cial random-walk series 43 application to international equities 54“66
asset pricing background issues 77“8 in a controlled simulation 50“4
at-the-money forward (ATMF) 134, 145 de¬nition 41“2
at-the-money forward volatility 355, 356 modelling 41“2
Augmented Dickey“Fuller (ADF) test statistic time series modelling and 42“5
6, 7, 385 conditional correlations 350
autoregression 240 Consumption Capital Asset Pricing Model
average gain/loss 31, 32 (CCAPM) 73
axis-parallel trees 172 contingent assets, pricing 242“3
contrarian strategies 340
backpropagation networks 25, 26, 37 cooling degree days (CDDs) 383
backpropagation neural networks (BPNNs) correct directional change (CDC) 31, 32, 143,
166 144
˜basis™ 48 prediction 130
Bayes™ formula 196
Bayesian approach to stochastic volatility Data Augmentation (DA) algorithm, missing
259“60 data and 381, 382, 388“9
Bayesian estimator 262 delta rule 166
benchmark models 10“20 de-trending of time series 43
Berndt, Hall, Hall and Hausman (BHHH) diagonal GARCH model 326“7
200 with variance targeting 327“8
Bernoulli model 357 Dickey“Fuller (DF) statistics 44, 53“4,
betaARCH 240 126
Applied Quantitative Methods for Trading and Investment. Edited by C.L. Dunis, J. Laws and P. Na¨m
±
™ 2003 John Wiley & Sons, Ltd ISBN: 0-470-84885-5
402 Index
difference stationary time series 43 orthogonal 313
Duf¬e“Kan (exponential-af¬ne) af¬ne models scalar, with variance targeting 328“9
of the term structure 78“83 GARCH (1,1) model 4, 144, 350
choice of benchmark model 135
GARCH (1,1) volatility forecasts 135“7
Ef¬cient Method of Moments (EMM) 256,
cf NNR/RNN in trading simulation 147“9
257“8
Garman“Kohlhagen pricing formula 130, 353
EGARCH 258
Gauss code for maximum likelihood for
error-correction model (ECM) 41
variance switching models 208“11
estimation bias in rolling estimation 221
Gaussian kernel regression 131
EUR/USD exchange rate 5“10
Expectation Maximisation (EM) algorithm, Generalised Method of Moments (GMM)
255“6
missing data and 381, 382
estimation of long-memory stochastic
expectation maximisation algorithm 386“8
E-step and M-step 387“8 volatility model 262
estimator 262“3
imputation accuracy 394
geometric Brownian motion 357
maximum-likelihood estimation 387
expectations theory German Yield curve, af¬ne modelling of
71“126
forward rate test 83“4
Gibbs algorithm 259
non-pure version 83
explained variance (EV) 26“7 Gini Index 174
Glosten, Jagannathan and Runkle (GJR) model
explicit factor modelling 66“7
320“2
exponential-af¬ne models of the term structure
78“83 Granger causality test 86, 124
Granger“Ramanathan optimal weighting
exponential smoothing 3
regression-based approach 38, 131,
exponentially weighted moving averages
(EWMA) 202, 203“4, 360“1 142“3
extrapolation, de¬nition 42
Hamilton ¬lter 251, 252, 255, 264, 265, 275,
282
factor sensitivities, de¬nition 215“16
fallback methodology to missing weather data for maximum likelihood estimation of
discrete stochastic volatility models
385“6
253
imputation accuracy 394
feedforward neural network 22 heating degree days (HDDs) 383“4
hedging, implicit, of unknown common risk
feedforward neural network, three-layer 167
factors, cointegration and 45“7
Fisher hypothesis 75
forecasting accuracy Hinton graph 28
Historical Burn Analysis (HBA) 384
out-of-sample accuracy measures 31
historical volatility 132“4, 243
out-of-sample forecasting accuracy results
32“4 homoskedastic model, one-factor 74
hyperbolic tangent 138
out-of-sample trading performance measures
31“2
out-of sample trading performance results implicit factor modelling 66, 67
34“6 implied standard deviation 130
forecasting and trading currency volatility implied volatility 134“5, 243
exchange rate series databank and 132“4 importance sampling 258“9
historical volatility 132“4 impulse-response analysis 86
implied volatility series databank 134“5 “impurity” index 173“4
foreign exchange volatility trading models independent components analysis (ICA) 67
145“9 index arbitrage 47“8
currency volatility trading models 145“7 indirect inference method 256, 257“8
trading simulation results 147“9 in¬‚ation measure 84
volatility trading strategies 145 Information Index 174
forward rate regression 75 interpolation, de¬nition 42
F -test 14
Jarque“Bera statistic 6, 9, 199, 200, 202,
GARCH model 130, 131, 193, 258, 318“20 241, 242, 385
diagonal 326“8 Jensen inequality 81
Index 403
jump models 261 alternative optimisation targets 308“10
jump process 357“8 matrix algebra in Excel with increasing
assets 303“8
Kalman ¬lter 76, 223“36, 251, 255, 275 matrix approach 301“3
simple model 294“300
applied to German yield 87“106
Max Minority 174
imputation accuracy 394
missing data 381, 382, 389“90 maximum drawdown 31
maximum likelihood estimation 195“7, 387
for quasi-maximum likelihood estimation of
maximum likelihood procedure applied to
continuous stochastic volatility models
252“3 German yield 87“106
Kalman gain matrix 22 Mean Absolute Error (MAE) 31, 32, 130,
143, 144, 393
Kalman smoother 269
for continuous stochastic volatility models mean absolute percentage error (MAPE) 29,
254“5 31, 32, 130, 393
mean error (ME) 393
Kim smoother 272
for discrete stochastic volatility models mean squared error (MSE) 216“17
255 method of moments 255“6
Kolmogorov“Smirnov test 199, 200, 202 Metropolis“Hastings steps 259, 260, 261
missing data, ¬lling analysis for 381“98
Monte Carlo simulation 244, 281
latent factor models 84
likelihood ratio (LR) test 14 Monte Carlo variance 258“9
Moving Average Convergence/Divergence
linear cross-correlation analysis 26
(MACD) technical models 32, 34,
Ljung“Box test 199, 202
log-normal stochastic volatility model 245 36, 37
in EUR/USD forecasting and trading models
log-periodogram estimator, semi parametric
10“11
262, 263
logit estimation 10, 17“20 long-term (LMA) 11
short-term (SMA) 11
logit1 model 18, 19“20, 21, 22, 23, 32“6
multilayer feedforward neural networks
long-memory stochastic volatility model 262
(MLNNs) 166
multilayer perceptron 22, 137
market risk, VaR applied to 244
multivariate continuous stochastic volatility
Markov chain 246“7
models 264
ergodic 248
multivariate discrete stochastic volatility
Markov Chain Monte Carlo (MCMC) 256
models 264“5

<<

. 18
( 19)



>>