. 10
( 14)


Theorem 12.2 indicates that the leading term of kn n (mθ ) is h1/2 Tn with
π(x) = V (x). The di¬erences between the two test statistics are (a) the
empirical likelihood test statistic automatically studentizes via its internal al-
gorithm conducted at the background, so that there is no need to explicitly
estimate V (x); (b) the empirical likelihood statistic is able to capture other
features such as skewness and kurtosis exhibited in the data without using the
bootstrap resampling which involves more technical details when data are de-
pendent. If we choose kn = [1/(2h)] as prescribed, then the remainder term in
(12.21) becomes Op {h log2 (n)}.
270 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

We will now discuss the asymptotic distribution of the test statistic n (mθ ).
Theorem 12.3 was proven by Chen et al. (2001).

THEOREM 12.3 Suppose assumptions (i) - (vi), then
N 2 (s)ds
kn n (mθ ) ’

where N is a Gaussian process on [0, 1] with mean

E{N (s)} = h1/4 ∆n (s)/ V (s)
and covariance
f (s)σ 2 (s) W0 (s, t)
„¦(s, t) = Cov{N (s), N (t)} =
f (t)σ 2 (t) (2) (2)
W0 (s, s)W0 (t, t)
h’1 K{(s ’ y)/h}K{(t ’ y)/h}dy.
W0 (s, t) = (12.22)

As K is a compact kernel on [’1, 1], when both s and t are in SI (the interior
part of [0, 1]), we get from (12.22) with u = (s ’ y)/h
K(u)K{u ’ (s ’ t)/h}du
W0 (s, t) =

K(u)K{u ’ (s ’ t)/h}du
= K (2) (12.23)
where K (2) is the convolution of K. The compactness of K also means that
W0 (s, t) = 0 if |s ’ t| > 2h which implies „¦(s, t) = 0 if |s ’ t| > 2h. Hence
N (s) and N (t) are independent if |s ’ t| > 2h. As
f (s)σ 2 (s) = f (s)σ 2 (t) + O(h)
when |s ’ t| ¤ 2h, we get
W0 (s, t)
+ O(h),
„¦(s, t) = (12.24)
(2) (2)
W0 (s, s)W0 (t, t)
12.6 Goodness-of-Fit Statistic 271

So, the leading order of the covariance function is free of σ 2 and f , i.e. „¦(s, t)
is completely known.
h1/4 ∆n (s)
N0 (s) = N (s) ’ . (12.25)
V (s)
Then N0 (s) is a normal process with zero mean and covariance „¦. The bound-
edness of K implies W0 being bounded, and hence 0 „¦(t, t)dt < ∞. We will
N 2 (s)ds. Let T = T1 +T2 +T3 =
now study the expectation and variance of 0
N 2 (s)ds where
N0 (s)ds,
T1 =
V ’1/2 (s)∆n (s)N0 (s)ds
T2 = and
V ’1 (s)∆2 (s)ds.
= h1/2
T3 n

From some basic results on stochastic integrals, Lemma 12.2 and (12.24) fol-
E(T1 ) = „¦(s, s)ds = 1 and
= E[T1 ] ’ 1
Var(T1 ) (12.26)
1 1
2 2
E N0 (s)N0 (t) dsdt ’ 1
= (12.27)
0 0
1 1
„¦2 (s, t)dsdt
= 2
0 0
1 1
(2) (2) (2)
{W0 (s, t)}2 {W0 (s, s)W0 (t, t)}’1 dsdt {1 + O(h2 )}
= 2
0 0

From (12.23) and the fact that the size of the region [0, 1] \ SI,h is O(h), we
1 1
(2) (2) (2)
{W0 (s, t)}2 {W0 (s, s)W0 (t, t)}’1 dsdt
0 0
1 1
[K (2) {(s ’ t)/h}]2 dsdt {1 + O(1)}
= {K (0)}
0 0
(0)}’2 + O(h).
= hK (4) (0){K (2)
272 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

Var(T1 ) = 2hK (4) (0){K (2) (0)}’2 + O(h2 ).
It is obvious that E(T2 ) = 0 and

V ’1/2 (s)∆n (s)„¦(s, t)V ’1/2 (t)∆n (t)dsdt.
Var(T2 ) =

As ∆n and V ’1 are bounded in [0, 1], there exists a constant C1 such that

Var(T2 ) ¤ C1 h1/2 „¦(s, t)dsdt.

Furthermore we know from the discussion above,
W0 (s, t)
dsdt + O(h)
„¦(s, t)dsdt =
(2) (2)
W0 (s, s)W0 (t, t)
W0 (s, t)
dsdt + O(h)
K (2) (0)
¤4 C1 h + C1 h
K (2) (0)

with other constants C and C1 , and thus, there exists a constant C2 , such
Var(T2 ) ¤ C2 h 2 .

As T3 is non-random, we have
V ’1 (s)∆2 (s)ds
E(T ) = 1+h and (12.28)
2hK (4) (0){K (2) (0)}’2 + O(h)
Var{T } = (12.29)

(12.28) and (12.29) together with Theorem 12.3 give the asymptotic expecta-
tion and variance of the test statistic kn n (mθ ).

12.7 Goodness-of-Fit test
We now turn our interest to the derivation of the asymptotic distribution of
1 kn
kn n (mθ ). We do this by discretizing 0 N 2 (s)ds as (kn )’1 j=1 N 2 (tj )
12.7 Goodness-of-Fit test 273

where {tj }kn are the mid-points of the original bins in formulating n (mθ ).
If we choose kn = [(2h) ] such that |tj+1 ’ tj | ≥ 2h for all j, then {N (tj )}
are independent and each N (tj ) ∼ N(h1/4 ∆n (tj )/ V (tj ), 1). This means that
under the alternative H1
N 2 (tj ) ∼ χ2n (γkn ),

a non-central χ2 random variable with kn degree of freedom and the non-central
component γkn = h1/4 { j=1 ∆2 (tj )/V (tj )}1/2 . Under H0 ,

N 2 (tj ) ∼ χ2n

is χ2 -distributed with kn degrees of freedom. This leads to a χ2 test with
signi¬cance level ± which rejects H0 if n (mθ ) > χ2n ,± where χ2n ,± is the
˜ˆ k k
(1 ’ ±)-quantile of χkn . The asymptotic power of the χ2 test is P{χ2n (γkn ) >
χkn ,± }, which is sensitive to alternative hypotheses di¬ering from H0 in all
We may also establish the asymptotic normality of (kn )’1 i=1 N 2 (tj ) by ap-

plying the central limit theorem for a triangular array, which together with
(12.28) and (12.29) means that

∆2 (s)V ’1 (s)ds, 2hK (4) (0){K (2) (0)}’2 .
’ N 1 + h1/2
kn n (mθ )
˜ˆ n

A test for H0 with an asymptotic signi¬cance level ± is to reject H0 if

> 1 + z± {K (2) (0)}’1 2hK (4) (0)
kn n (mθ )
˜ˆ (12.30)

where P(Z > z± ) = ± and Z ∼ N(0, 1). The asymptotic power of this test is

K (2) (0) ∆2 (s)V ’1 (s)ds
1 ’ ¦ z± ’ . (12.31)
2K (4) (0)

We see from the above that the binning based on the bandwidth value h pro-
vides a key role in the derivation of the asymptotic distributions. However, the
binning discretizes the null hypothesis and unavoidably leads to some loss of
274 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

power as shown in the simulation reported in the next section. From the point
of view of retaining power, we would like to have the size of the bins smaller
than that prescribed by the smoothing bandwidth in order to increase the res-
olution of the discretized null hypothesis to the original H0 . However, this will
create dependence between the empirical likelihood evaluated at neighbouring
bins and make the above asymptotic distributions invalid. One possibility is
1 2
to evaluate the distribution of 0 N0 (s)ds by using the approach of Wood and
Chan (1994) by simulating the normal process N 2 (s) under H0 . However, this
is not our focus here and hence is not considered in this chapter.

12.8 Application
Figure 12.1 shows the daily closing value of the S&P 500 share index from
the 31st December 1976 to the 31st December 1997, which covers 5479 trading
days. In the upper panel, the index series shows a trend of exponential form
which is estimated using the method given in H¨rdle, Kleinow, Korostelev,
Logeay and Platen (2001). The lower panel is the residual series after removing
the exponential trend. In mathematical ¬nance one assumes often a speci¬c
dynamic form of this residual series, Platen (2000). More precisely, H¨rdle
et al. (2001) assume the following model for an index process S(t)
S(t) = S(0)X(t) exp ·(s)ds (12.32)

with a di¬usion component X(t) solving the stochastic di¬erential equation

dX(t) = a{1 ’ X(t)}dt + σX 1/2 (t)dW (t) (12.33)

where W (t) is a Brownian motion and ± and σ are parameters. Discretizing
this series with a sampling interval ∆ leads to the observations (Xi , Yi ) with
Yi = X(i+1)∆ ’ Xi∆ and Xi = Xi∆ , which will be ±-mixing and ful¬ll all the
other conditions assumed in Section 12.3.
We now apply the empirical likelihood test procedure on the S&P 500 data
presented in Figure 12.1 to test the parametric mean function m(x) = a(1 ’ x)
given in the Cox-Ingersoll-Ross di¬usion model (12.33). The process X is
restored from the observed residuals by the approach introduced in H¨rdlea
et al. (2001). The parametric estimate for a is a = 0.00968 by using methods
based on the marginal distribution and the autocorrelation structure of X.
For details about the procedure see H¨rdle et al. (2001). The cross validation
12.8 Application 275





03.01.1977 01.01.1982 01.01.1987 01.01.1992 01.01.1997





03.01.1977 01.01.1982 01.01.1987 01.01.1992 01.01.1997

Figure 12.1. The S&P 500 Data. The upper plot shows the S&P 500
together with the exponential trend. The lower plot shows the residual
process X.

is used to ¬nd the bandwidth h. However, the score function is monotonic
decreasing for h < 0.15 and then become a ¬‚at line for h ∈ [0.15, 0.8]. This
may be caused by the di¬erent intensity level of the design points. Further
investigation shows that a h-value larger (smaller) than 0.06 (0.02) produces an
oversmoothed (undersmoothed) curve estimate. Therefore, the test is carried
out for a set of h values ranging from 0.02 to 0.06. The P-values of the test as
a function of h are plotted in Figure 12.2.
276 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions


0.03 0.04 0.05 0.06
Bandwidth h

Figure 12.2. The p-values of the S&P 500 Data

The P-values indicate that there is insu¬cient evidence to reject the di¬usion

12.9 Simulation Study and Illustration
We investigate our testing procedure in two simulation studies. In our ¬rst
simulation we consider the time series model
Yi = 2Yi’1 /(1 + Yi’1 ) + cn sin(Yi’1 ) + σ(Yi’1 )·i

where {·i } are independent and identically distributed uniform random vari-
ables in [’1, 1], ·i is independent of Xi = Yi’1 for each i, and σ(x) =
exp(’x2 /4). Note that the mean and the variance functions are both bounded
which ensures the series is asymptotically stationary. To realize the station-
arity, we pre-run the series 100 times with an initial value Y’100 = 0. The
empirical likelihood test statistic is calculated via the elmtest quantlet.
12.9 Simulation Study and Illustration 277

{el,p,kn,h2} = elmtest(x,y,model{,kernel{,h{,theta}}})
calculates the empirical likelihood test statistic

The ¬rst and the second parameter are the vectors of observations of X and
Y . The third parameter model is the name of a quantlet that implements the
parametric model for the null hypothesis. The optimal parameter kernel is
the name of the kernel K that is used to calculate the test statistic and h is the
¯ ¯
bandwidth used to calculate U1 and U2 in (12.18). theta is directly forwarded
to the parametric model.

For the simulation study the sample sizes considered for each trajectory are
n = 500 and 1000 and cn , the degree of di¬erence between H0 and H1 , takes
value of 0, 0.03 and 0.06. As the simulation shows that the two empirical
likelihood tests have very similar power performance, we will report the results
for the test based on the χ2 distribution only. To gauge the e¬ect of the
smoothing bandwidth h on the power, ten levels of h are used for each simulated
sample to formulate the test statistic.

n = 500 n = 1000

cn = 0.06

cn = 0.06
power of the EL test

power of the EL test


cn = 0.03
cn = 0.03


cn = 0.00 cn = 0.00

0.2 0.4 0.6 0.8 0.2 0.4 0.6
bandwidth h bandwidth h

Figure 12.3. Power of the empirical likelihood test. The dotted lines
indicate the 5% level
278 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

Figure 12.3 presents the power of the empirical likelihood test based on 5000
simulation with a nominal 5% level of signi¬cance. We notice that when cn = 0
the simulated signi¬cance level of the test is very close to the nominal level for
large range of h values which is especially the case for the larger sample size
n = 1000. When cn increases, for each ¬xed h the power increases as the
distance between the null and the alternative hypotheses becomes larger. For
each ¬xed cn , there is a general trend of decreasing power when h increases.
This is due to the discretization of H0 by binning as discussed at the end of the
previous section. We also notice that the power curves for cn = 0.06 are a little
erratic although they maintain the same trend as in the case of cn = 0.03. This
may be due to the fact that when the di¬erence between H0 and H1 is large, the
di¬erence between the nonparametric and the parametric ¬ts becomes larger
and the test procedure becomes more sensitive to the bandwidths.
In our second simulation study we consider an Ornstein-Uhlenbeck process Z
¬‚uctuating about 0 that satis¬es the stochastic di¬erential equation
dZ(t) = aZ(t)dt + σdW (t)
where W is a standard Brownian Motion. The speed of adjustment parameter
a has to be negative to ensure stationarity. To apply the empirical likelihood
test we construct the time series X and Y as in Section 12.2, i.e.
= Z ∆ (ti ) ,
Xi X = (X1 , . . . , Xn )
= W (ti+1 ) ’ W (ti ) ,
µi µ = (µ1 , . . . , µn )
= Xi+1 ’ Xi = aXi ∆ + σµi ,
Yi Y = (Y1 , . . . , Yn ) (12.34)
It is well known that the transition probability of an Ornstein-Uhlenbeck pro-
cess is normal with conditional mean
E[Zt+∆ |Zt = x] = E[Xi+1 |Xi = x] = xea∆
and conditional variance
e’2β∆ ’ 1 .
Var(Zt+∆ |Zt = x) = Var(Xi+1 |Xi = x) =
To simulate the process we use the simou quantlet.

x = simou(n,a,s,delta)
simulates a discretely observed path of an Ornstein-Uhlenbeck
process via its transition probability law.
12.10 Appendix 279

The number of observations is given by n+1/, a is the speed of adjustment
parameter a, s is the di¬usion coe¬cient σ and delta is the time di¬erence ∆
between two observations.
The proposed simulation procedure and the Goodness-of-Fit test are illustrated
in XFGelsim2.xpl.

12.10 Appendix
LEMMA 12.2 Let X, Y be standard normal random variables with covariance
Cov(X, Y ) = ρ, i.e.

X 0 1ρ
∼N , . (12.37)
Y 0

Then we have:
Cov(X 2 , Y 2 ) = 2ρ2

1 ’ ρ2 Z. Then we
De¬ne Z ∼ N(0, 1) independent of X and X = ρX +
X 0 1ρ
∼N , .
X 0
Cov(X 2 , Y 2 ) = Cov(X 2 , X ) = 2ρ2

Baggerly, K. A. (1998). Empirical likelihood as a goodness-of-¬t measure,
Biometrika 85: 535“547.
280 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

Bibby, B. M. and Sørensen, M. (1995). Martingale estimation functions for
discretely observed di¬usion processes, Bernoulli 1(1/2): 17 “ 40.

Billingsley, P. (1999). Convergence of Probability Measures, Wiley, New York.
Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes, Vol. 110 of
Lecture Notes in Statistics, Springer-Verlag, Heidelberg.
Chen, S. X., H¨rdle, W. and Kleinow, T. (2001). An empirical likelihood
goodness-of-¬t test for time series, Discussion paper 1, Sonderforschungs-
bereich 373, Humboldt-Universit¨t zu Berlin.

Genon-Catalot, V., Jeantheau, T. and Lar´do, C. (2000). Stochastic volatility
models as hidden markov models and statistical applications, Bernoulli
H¨rdle, W. (1990). Applied Nonparametric Regression, number 19 in Econo-
metric Society Monographs, Cambridge University Press.
H¨rdle, W., Kleinow, T., Korostelev, A., Logeay, C. and Platen, E. (2001).
Semiparametric di¬usion estimation and application to a stock market
index, Discussion Paper 24, Sonderforschungsbereich 373, Humboldt-
Universit¨t zu Berlin.
H¨rdle, W. and Mammen, E. (1993). Comparing nonparametric versus para-
metric regression ¬ts, Ann. Statist. 21: 1926“1947.
H¨rdle, W., M¨ller, M., Sperlich, S. and Werwatz, A. (2000). Non- and semi-
a u
parametric modelling, XploRe e-book, www.xplore-stat.de.

Hart, J. D. (1997). Nonparametric smoothing and lack-of-¬t tests., Springer,
New York.
Kloeden, P. E. and Platen, E. (1999). Numerical Solution of Stochastic Di¬er-
ential Equations, Vol. 23 of Applications of Mathematics, Springer Verlag
Berlin Heidelberg.
Kreiß, J.-P., Neumann, M. and Yao, Q. (1998). Bootstrap tests for simple
structures in nonparametric time series regression. Discussion paper, Son-
derforschungsbereich 373.
Owen, A. (1988). Empirical likelihood ratio con¬dence intervals for a single
functional, Biometrika 75: 237“249.
12.10 Appendix 281

Owen, A. (1990). Empirical likelihood ratio con¬dence regions, Ann. Statist.
18: 90“120.

Owen, A. (1991). Empirical likelihood for linear model, Ann. Statist. 19: 1725“
Platen, E. (2000). Risk premia and ¬nancial modelling without measure trans-
formation. University of Technology Sydney, School of Finance & and
Economics and Department of Mathematical Sciences.
Wand, M. and Jones, M. (1995). Kernel Smoothing, number 60 in Monographs
in Statistics and Applied Probability, Chapman & Hall.
Wood, A. T. A. and Chan, G. (1994). Simulation of stationary gaussian process
in [0, 1]d , J. Comp. Graph. Stat. 3: 409“432.
13 A simple state space model of
house prices
Rainer Schulz and Axel Werwatz

13.1 Introduction
For most people, purchasing a house is a major decision. Once purchased,
the house will by far be the most important asset in the buyer™s portfolio.
The development of its price will have a major impact on the buyer™s wealth
over the life cycle. It will, for instance, a¬ect her ability to obtain credit
from commercial banks and therefore in¬‚uence her consumption and savings
decisions and opportunities. The behavior of house prices is therefore of central
interest for (potential) house buyers, sellers, developers of new houses, banks,
policy makers or, in short, the general public.
An important property of houses is that they are di¬erent from each other.
Hence, while houses in the same market (i.e., the same city, district or neigh-
borhood) will share some common movements in their price there will at all
times be idiosyncratic di¬erences due to di¬erences in maintenance, design or
furnishing. Thus, the average or median price will depend not only on the
general tendency of the market, but also on the composition of the sample. To
calculate a price index for real estate, one has to control explicitly for idiosyn-
cratic di¬erences. The hedonic approach is a popular method for estimating
the impact of the characteristics of heterogenous goods on their prices.
The statistical model used in this chapter tries to infer the common component
in the movement of prices of 1502 single-family homes sold in a district of Berlin,
Germany, between January 1980 and December 1999. It combines hedonic
regression with Kalman ¬ltering. The Kalman ¬lter is the standard statistical
tool for ¬ltering out an unobservable, common component from idiosyncratic,
284 13 A simple state space model of house prices

noisy observations. We will interpret the common price component as an index
of house prices in the respective district of Berlin. We assume that the index
follows an autoregressive process. Given this assumption, the model is writable
in state space form.
The remainder of this chapter is organized as follows. In the next section we
propose a statistical model of house prices and discuss its interpretation and
estimation. Section 13.4 introduces the data, while Section 13.5 describes the
quantlets used to estimate the statistical model. In this section we present also
the estimation results for our data. The ¬nal section gives a summary.

13.2 A Statistical Model of House Prices

13.2.1 The Price Function

The standard approach for constructing a model of the prices of heterogeneous
assets is hedonic regression (Bailey, Muth and Nourse, 1963; Hill, Knight and
Sirmans, 1997; Shiller, 1993). A hedonic model starts with the assumption
that on the average the observed price is given by some function f (It , Xn,t , β).
Here, It is a common price component that “drives” the prices of all houses, the
vector Xn,t comprises the characteristics of house n and the vector β contains
all coe¬cients of the functional form.
Most studies assume a log-log functional form and that It is just the constant
of the regression for every period (Clapp and Giaccotto, 1998; Cho, 1996). In
that case
pn,t = It + xn,t β + µn,t . (13.1)
Here, pn,t denotes the log of the transaction price. The vector xn,t contains the
transformed characteristics of house n that is sold in period t. The idiosyncratic
in¬‚uences µn,t are white noise with variance σµ .
Following Schwann (1998), we put some structure on the behavior of the com-
mon price component over time by assuming that the common price compo-
nent follows an autoregressive moving average (ARMA) process. For our data
it turns out that the following AR(2) process

It = φ1 It’1 + φ2 It’2 + νt (13.2)

with I0 = 0 su¬ces. This autoregressive speci¬cation re¬‚ects that the market
for owner-occupied houses reacts sluggish to changing conditions and that any
13.2 A Statistical Model of House Prices 285

price index will thus exhibit some autocorrelation. This time-series-based way
of modelling the behavior of It is more parsimonious than the conventional
hedonic regressions (which need to include a seperate dummy variable for each
time period) and makes forecasting straightforward.

13.2.2 State Space Form

We can rewrite our model (13.1) and (13.2) in State Space Form (SSF)
(Gourieroux and Monfort, 1997). In general, the SSF is given as:
±t = ct + Tt ±t’1 + µs (13.3a)

yt = dt + Zt ±t + µm (13.3b)

µs ∼ (0, Rt ) , µm ∼ (0, Ht ) . (13.3c)
t t
The notation partially follows Harvey (1989; 1993). The ¬rst equation is the
state equation and the second is the measurement equation. The characteristic
structure of state space models relates a series of unobserved values ±t to a
set of observations yt . The unobserved values ±t represent the behavior of the
system over time (Durbin and Koopman, 2001).
The unobservable state vector ±t has the dimension K 1, Tt is a square
matrix with dimension K — K, the vector of the observable variables yt has the
dimension Nt — 1. Here, Nt denotes the number of observations yt,n in period
t T . If the number of observations varies through periods, we denote
N= max Nt .
t=1,··· ,T

The matrix Zt contains constant parameters and other exogenous observable
variables. Finally, the vectors ct and dt contain some constants. The system
matrices ct , Tt , Rt , dt , Zt , and Ht may contain unknown parameters that have
to be estimated from the data.
In our model”that is (13.1) and (13.2)”, the common price component It and
the quality coe¬cients β are unobservable. However, whereas these coe¬cients
are constant through time, the price component evolves according to (13.2).
The parameters φ1 , φ2 , and σν of this process are unknown.
The observed log prices are the entries in yt of the measurement equation
and the characteristics are entries in Zt . In our data base we observe three
286 13 A simple state space model of house prices

characteristics per object. Furthermore, we include the constant β0 . We can
put (13.1) and (13.2) into SSF by setting
®  ®  ®
It φ1 1 0 0 0 0 νt
φ2 It’1  φ2 0 0 0 0 0 0
    
 β0 
 , Tt =  0 0 1 0 0 0 s  0 

±t =  , µ =   (13.4a)
0 t  0 
 β1  0 0 0 1 0
    
° β2 » °0 0 0 0 1 0» °0»
β3 0000 0 1 0
®  ® 
1 0 x1,t µ1,t
® 
. . .  , µm =  . 
yt = ° . . . » , Zt = ° . . .» t °.» (13.4b)
. . . .
pNt ,t 1 0 xNt ,t µNt ,t
For our model, both ct and dt are zero vectors. The transition matrices Tt are
non time-varying. The variance matrices of the state equation Rt are identical
for all t and equal to a 6 — 6 matrix, where the ¬rst element is σν and all other
elements are zeros. Ht is a Nt — Nt diagonal matrix with σµ on the diagonal.
The variance σµ is also an unknown parameter.
The ¬rst two elements of the state equation just resemble the process of the
common price component given in (13.2). However, we should mention that
there are other ways to put an AR(2) process into a SSF (see Harvey, 1993, p.
84). The remaining elements of the state equation are the implicit prices β of
the hedonic price equation (13.1). Multiplying the state vector ±t with row n
of the matrix Zt gives It + xt,n β. This is just the functional relation (13.1) for
the log price without noise. The noise terms of (13.1) are collected in the SSF
in the vector µm . We assume that µm and µs are uncorrelated. This is required
t t t
for identi¬cation (Schwann, 1998, p. 274).

13.3 Estimation with Kalman Filter Techniques

13.3.1 Kalman Filtering given all parameters
def 2 2
Given the above SSF and all unknown parameters ψ = (φ1 , φ2 , σν , σµ ), we
can use Kalman ¬lter techniques to estimate the unknown coe¬cients β and
the process of It . The Kalman ¬lter technique is an algorithm for estimating
the unobservable state vectors by calculating its expectation conditional on
13.3 Estimation with Kalman Filter Techniques 287

information up to s T . In the ongoing, we use the following general notation:
at|s = E[±t |Fs ] (13.5a)
denotes the ¬ltered state vector and
Pt|s = E[(±t ’ at|s )(±t ’ at|s ) |Fs ] (13.5b)
denotes the covariance matrix of the estimation error and Fs is a shorthand
for the information available at time s.
Generally, the estimators delivered by Kalman ¬ltering techniques have min-
imum mean-squared error among all linear estimators (Shumway and Stof-
fer, 2000, Chapter 4.2). If the initial state vector, the noise µm and µs are
multivariate Gaussian, then the Kalman ¬lter delivers the optimal estimator
among all estimators, linear and nonlinear (Hamilton, 1994, Chapter 13).
The Kalman ¬lter techniques can handle missing observations in the measure-
ment equation (13.3b). For periods with less than N observations, one has to
adjust the measurement equations. One can do this by just deleting all elements
of the measurement matrices dt , Zt , Ht for which the corresponding entry in
yt is a missing value. The quantlets in XploRe use this procedure. Another
way to take missing values into account is proposed by Shumway and Sto¬er
(1982; 2000): replace all missing values with zeros and adjust the other mea-
surement matrices accordingly. We show in Appendix 13.6.1 that both methods
deliver the same results. For periods with no observations the Kalman ¬lter
techniques recursively calculate an estimate given recent information (Durbin
and Koopman, 2001).

13.3.2 Filtering and state smoothing

The Kalman ¬lter is an algorithm for sequently updating our knowledge of the
system given a new observation yt . It calculates one step predictions conditional
on s = t. Using our general expressions, we have
at = E[±t |Ft ]
Pt = E[(±t ’ at )(±t ’ at ) |Ft ] .
Here we use the standard simpli¬ed notation at and Pt for at|t and Pt|t . As a
by-product of the ¬lter, the recursions calculate also
at|t’1 = E[±t |Ft’1 ]
288 13 A simple state space model of house prices

Pt|t’1 = E[(±t ’ at|t’1 )(±t ’ at|t’1 ) |Ft’1 ] .
We give the ¬lter recursions in detail in Subsection 13.5.3.
The Kalman smoother is an algorithm to predict the state vector ±t given the
whole information up to T . Thus we have with our general notation s = T and

at|T = E[±t |FT ]
the corresponding covariance matrix

Pt|T = E[(±t ’ at|T )(±t ’ at|T ) |FT ] .

We see that the ¬lter makes one step predictions given the information up
to t ∈ {1, . . . , T } whereas the smoother is backward looking. We give the
smoother recursions in detail in Subsection 13.5.5.

13.3.3 Maximum likelihood estimation of the parameters

Given the system matrices ct , Tt , Rt , dt , Zt , and Ht , Kalman ¬ltering tech-
niques are the right tool to estimate the elements of the state vector. However,
in our model some of these system matrices contain unknown parameters ψ.
These parameters have to be estimated by maximum likelihood.
Given a multivariate Gaussian error distribution, the value of the log likelihood
function l(ψ) for a general SSF is up to an additive constant equal to:
1 1
vt Ft’1 vt .
’ ln |Ft | ’ (13.9)
2 2
t=1 t=1

vt = yt ’ dt ’ Zt at|t’1 (13.10)
are the innovations of the ¬ltering procedure and at|t’1 is the conditional
expectation of ±t given information up to t ’ 1. As we have already mentioned,
these expressions are a by-product of the ¬lter recursions. The matrix Ft
is the covariance matrix of the innovations at time t and also a by-product
of the Kalman ¬lter. The above log likelihood is known as the prediction
error decomposition form (Harvey, 1989). Periods with no observations do not
contribute to the log likelihood function.
13.4 The Data 289

Starting with some initial value, one can use numerical maximization methods
to obtain an estimate of the parameter vector ψ. Under certain regularity con-
ditions, the maximum likelihood estimator ψ is consistent and asymptotically
normal. One can use the information matrix to calculate standard errors of ψ
(Hamilton, 1994).

13.3.4 Diagnostic checking

After ¬tting a SSF, one should check the appropriateness of the results by
looking at the standardized residuals
vt = Ft vt . (13.11)
If all parameters of the SSF were known, vt would follow a multivariate stan-
dardized normal distribution (Harvey, 1989, see also (13.9)). We know that Ft
is a symmetric matrix and that it should be positive de¬nite (recall that it is
just the covariance matrix of the innovations vt ). So
’1/2 ’1/2
Ft = Ct Λt Ct , (13.12)
where the diagonal matrix Λt contains all eigenvalues of Ft and Ct is the ma-
trix of corresponding normalized eigenvectors (Greene, 2000, p.43). The stan-
dardized residuals should be distributed normally with constant variance, and
should show no serial correlation. It is a signal for a misspeci¬ed model when
the residuals do not possess these properties. To check the properties, one
can use standard test procedures. For example, a Q-Q plot indicates if the
quantiles of the residuals deviate from the corresponding theoretical quantiles
of a normal distribution. This plot can be used to detect non-normality. The
Jarque-Bera test for normality can also be used for testing non-normality of
the residuals (Bera and Jarque, 1982). This test is implemented in XploRe as
In the empirical part, we combine Kalman ¬lter techniques and maximum
likelihood to estimate the unknown parameters and coe¬cients of the SSF for
the house prices in a district of Berlin.

13.4 The Data
The data set is provided by the Gutachterausschuß f¨r Grundst¨ckswerte in
u u
Berlin, an expert commission for Berlin™s real estate market. The commission
290 13 A simple state space model of house prices

collects information on all real estate transactions in Berlin in a data base called
Automatisierte Kaufpreissammlung.
Here, we use data for 1502 sales of detached single-family houses in a district
of Berlin for the years 1980 to 1999, stored in MD*BASE. Besides the price,
we observe the size of the lot, the ¬‚oor space, and the age of the house. The
data set XFGhouseprice contains the log price observations for all 80 quarters.
There are at most N = 43 observations in any quarter. The following lines of
XploRe code

Y = read("XFGhouseprice.dat")

can be used to take a look at the entries of XFGhouseprice. Every column
gives the observations for one quarter. Thus, in columns 41 to 44 we ¬nd the
observations for all quarters of 1990. If less than 43 transactions are observed
in a quarter the remaining entries are ¬lled with the missing value code NaN.
Only in the ¬rst quarter of the year 1983 we observe 43 transactions.
The corresponding data set XFGhousequality contains the observed charac-
teristics of all houses sold. They are ordered in the following way: each column
contains all observations for a given quarter. Remember that for every house
we observe log size of the lot, log size of the ¬‚oor space and age. The ¬rst
three rows of a column refer to the ¬rst house in t, the next three to the second
house and so on.
Let us look at the characteristics of the ¬rst two observations in 1990:1. Just
type the following lines in the XploRe input window

X = read("XFGhousequality.dat")

After compiling, you get the output

[1,] 6.1048 4.7707 53 6.5596 5.1475 13

The size of the lot for the second house is about 706 square meters (just take
the antilog). The size of the ¬‚oor space is 172 square meters and the age is 13
13.4 The Data 291

The following table shows summary statistics of our Berlin house price data.

" Summary statistics for the Berlin house price data "
" Sample for 80 quarters with 1502 observations "
" "
" Observations per period "
" ----------------------------------------------------"
" Minimum = 4 Average = 18.77 Maximum = 43 "
" "
" Transaction prices (in thousand DM) "
" ----------------------------------------------------"
" Minimum = 100.00 Average = 508.46 "
" Maximum = 1750.01 Std. Dev. = 197.92 "
" "
" Size of the lot (in square meters) "
" ----------------------------------------------------"
" Minimum = 168.00 Average = 626.18 "
" Maximum = 2940.00 Std. Dev. = 241.64 "
" "
" Size of the floor space (in square meters) "
" ----------------------------------------------------"
" Minimum = 46.00 Average = 144.76 "
" Maximum = 635.00 Std. Dev. = 48.72 "
" "
" Age of the building (in years) "
" ----------------------------------------------------"
" Minimum = 0 Average = 28.59 "
" Maximum = 193 Std. Dev. = 21.58 "

Not surprisingly for detached houses there are large di¬erences in the size of
the lot. Some houses were new in the period of the sale while one was 193
years old. That is a good example for the potential bias of the average price
per quarter as a price index. If we do not control explicitly for depreciation we
might obtain a low price level simply because the houses sold in a quarter were
292 13 A simple state space model of house prices

Nevertheless, the average price per quarter can give an indication of the price
level. Figure 13.1 shows the average price per quarter along with con¬dence
intervals at the 90% level. Instead of the average price, we could also calculate
an average adjusted price, where the most important characteristic is used for
the adjustment. Such adjustment is attained by dividing the price of every
house by”for example”the respective size of the lot. However, even in that
case we would control only for one of the observed characteristics. In our model
we will control for all of the observed characteristics.






1980:1 1985:1 1990:1 1995:1 1999:4

Figure 13.1. Average price per quarter, units are Deutsche Mark (1
DM ≈ 0.511 EURO). Con¬dence intervals are calculated for the 90%
13.5 Estimating and ¬ltering in XploRe 293

13.5 Estimating and ¬ltering in XploRe

13.5.1 Overview

The procedure for Kalman ¬ltering in XploRe is as follows: ¬rst, one has
to set up the system matrices using gkalarray. The quantlet adjusts the
measurement matrices for missing observations.
After the set up of the system matrices, we calculate the Kalman ¬lter with
gkalfilter. This quantlet also calculates the value of the log likelihood
function given in equation (13.9). That value will be used to estimate the
unknown parameters of the system matrices with numerical maximization
(Hamilton, 1994, Chapter 5). The ¬rst and second derivatives of the log like-
lihood function will also be calculated numerically. To estimate the unknown
state vectors”given the estimated parameters”we use the Kalman smoother
gkalsmoother. For diagnostic checking, we use the standardized residuals
(13.11). The quantlet gkalresiduals calculates these residuals.

13.5.2 Setting the system matrices

gkalarrayOut = gkalarray(Y,M,IM,XM)
sets the system matrices for a time varying SSF

The Kalman ¬lter quantlets need as arguments arrays consisting of the system
matrices. The quantlet gkalarray sets these arrays in a user-friendly way. The
routine is especially convenient if one works with time varying system matrices.
In our SSF (13.4), only the system matrix Zt is time varying. As one can see
immediately from the general SSF (13.3), possibly every system matrix can be
time varying.
The quantlet uses a three step procedure to set up the system matrices.

1. To de¬ne a system matrix all constant entries must be set to their re-
spective values and all time varying entries must be set to an arbitrary
number (for example to 0).
2. One must de¬ne an index matrix for every system matrix. An entry is
set to 0 when its corresponding element in the system matrix is constant
and to some positive integer when it is not constant.
294 13 A simple state space model of house prices

3. In addition, for every time varying system matrix, one also has to specify
a data matrix that contains the time varying entries.

gkalarray uses the following notation: Y denotes the matrix of all observations
[y1 , . . . , yT ], M denotes the system matrix, IM denotes the corresponding index
matrix and XM the data matrix.
If all entries of a system matrix are constant over time, then the parameters
have already been put directly into the system matrix. In this case, one should
set the index and the data matrix to 0.
For every time varying system matrix, only constant parameters”if there are
any”have already been speci¬ed with the system matrix. The time-varying
coe¬cients have to be speci¬ed in the index and the data matrix.
In our example, only the matrices Zt are time varying. We have
® 
1 01000
. . . . . .
Z = °. . . . . .»
. .....
1 0 1 0 0 0
® 
0 0 0 1 2 3
0 0 0 4 5 6 
 
. . . . . .
°. . . . . . 
. . . . . . »
0 0 0 (3N + 1) (3N + 2) (3N + 3)
XZ XFGhousequality

The system matrix Zt has the dimension (N — 6). The non-zero entries in the
index matrix IZ prescribe the rows of XFGhousequality, which contain the
time varying elements.
The output of the quantlet is an array that stacks the system matrices one
after the other. For example, the ¬rst two rows of the system matrix Z41 are

[1,] 1 0 1 6.1048 4.7707 53
[2,] 1 0 1 6.5596 5.1475 13
It is easy to check that the entries in the last three columns are just the char-
acteristics of the ¬rst two houses that were sold in 1990:1 (see p. 290).
13.5 Estimating and ¬ltering in XploRe 295

13.5.3 Kalman ¬lter and maximized log likelihood

{gkalfilOut,loglike} = gkalfilter(Y,mu,Sig,ca,Ta,Ra,
Kalman ¬lters a time-varying SSF

We assume that the initial state vector at t = 0 has mean µ and covariance
matrix Σ. Recall, that Rt and Ht denote the covariance matrix of the state noise
and”respectively”of the measurement noise. The general ¬lter recursions are
as follows:
Start at t = 1: use the initial guess for µ and Σ to calculate

a1|0 = c1 + T1 µ
P1|0 = T1 ΣT1 + R1
F1 = Z1 P1|0 Z1 + H1


= a1|0 + P1|0 Z1 F1 (y1 ’ Z1 a1|0 ’ d1 )
= P1|0 ’ P1|0 Z1 F1 Z1 P1|0

Step at t T : using at’1 and Pt’1 from the previous step, calculate

at|t’1 = ct + Tt at’1
Pt|t’1 = Tt Pt’1 Tt + Rt
Ft = Zt Pt|t’1 Zt + Ht


= at|t’1 + Pt|t’1 Zt Ft’1 (yt ’ Zt at|t’1 ’ dt )
= Pt|t’1 ’ Pt|t’1 Zt Ft’1 Zt Pt|t’1

The implementation for our model is as follows: The arguments of gkalfilter
are the data matrix Y, the starting values mu (µ), Sig (Σ) and the array for
every system matrix (see section 13.5.2). The output is a T + 1 dimensional
296 13 A simple state space model of house prices

array of [at Pt ] matrices. If one chooses l = 1 the value of the log likelihood
function (13.9) is calculated.
Once again, the T + 1 matrices are stacked “behind each other”, with the t = 0
matrix at the front and the t = T matrix at the end of the array. The ¬rst
entry is [µ Σ].
How can we provide initial values for the ¬ltering procedure? If the state
matrices are non time-varying and the transition matrix T satis¬es some sta-
bility condition, we should set the initial values to the unconditional mean and
variance of the state vector. Σ is given implicitly by

vec(Σ) = (I ’ T — T )’1 vec(R) .

Here, vec denotes the vec-operator that places the columns of a matrix below
each other and — denotes the Kronecker product. Our model is time-invariant.
But does our transition matrix ful¬ll the stability condition? The necessary and
su¬cient condition for stability is that the characteristic roots of the transition
matrix T should have modulus less than one (Harvey, 1989, p. 114). It is easy
to check that the characteristic roots »j of our transition matrix (13.4a) are
given as
φ1 ± φ2 + 4φ2
»1,2 = .
For example, if φ1 and φ2 are both positive, then φ1 + φ2 < 1 guarantees real
characteristic roots that are smaller than one (Baumol, 1959, p. 221). However,
when the AR(2) process of the common price component It has a unit root,
the stability conditions are not ful¬lled. If we inspect Figure 13.1, a unit root
seems quite plausible. Thus we can not use this method to derive the initial
If we have some preliminary estimates of µ, along with preliminary measures of
uncertainty”that is a estimate of Σ”we can use these preliminary estimates
as initial values. A standard way to derive such preliminary estimates is to
use OLS. If we have no information at all, we must take di¬use priors about
the initial conditions. A method adopted by Koopman, Shephard and Doornik
(1999) is setting µ = 0 and Σ = κI where κ is an large number. The large
variances on the diagonal of Σ re¬‚ect our uncertainty about the true µ.
We will use the second approach for providing some preliminary estimates as
initial values. Given the hedonic equation (13.1), we use OLS to estimate It ,
β, and σm by regressing log prices on lot size, ¬‚oor space, age and quarterly
time dummies. The estimated coe¬cients of lot size, ¬‚oor space and age are
13.5 Estimating and ¬ltering in XploRe 297

coe¬cient t-statistic p-value
log lot size 0.2675 15.10 0.0000
log ¬‚oor space 0.4671 23.94 0.0000
age -0.0061 -20.84 0.0000

Regression diagnostics
R2 0.9997 Number of observations 1502
R 0.9997 F-statistic 64021.67
σµ 0.4688 Prob(F-statistic) 0.0000

Table 13.1. Results for hedonic regression

reported in Table 13.1. They are highly signi¬cant and reasonable in sign and
magnitude. Whereas lot size and ¬‚oor space increase the price on average, age
has the opposite e¬ect. According to (13.1), the common price component It
is a time-varying constant term and is therefore estimated by the coe¬cients
of the quarterly time dummies, denoted by {It }80 . As suggested by (13.2),
these estimates are regressed on their lagged values to obtain estimates of the
unknown parameters φ1 , φ2 , and σs . Table 13.2 presents the results for an
AR(2) for the It series. The residuals of this regression behave like white noise.

coe¬cient t-statistic p-value
constant 0.5056 1.3350 0.1859
It’1 0.4643 4.4548 0.0000
It’2 0.4823 4.6813 0.0000

Regression diagnostics
R2 0.8780 Number of observations 78
R 0.8747 F-statistic 269.81
σν 0.0063 Prob(F-statistic) 0.0000

Table 13.2. Time series regression for the quarterly dummies
298 13 A simple state space model of house prices

We should remark that
ˆ ˆ
φ1 + φ2 ≈ 1
and thus the process of the common price component seems to have a unit
Given our initial values we maximize the log likelihood (13.9) numerically with
respect to the elements of ψ — = (φ1 , φ2 , log(σν ), log(σµ )). Note that ψ — di¬ers
2 2
2 2
from ψ by using the logarithm of the variances σν and σµ . This transformation is
known to improve the numerical stability of the maximization algorithm, which
employs nmBFGS of XploRe™s nummath library. Standard errors are computed
from inverting the Hessian matrix provided by nmhessian. The output of the
maximum likelihood estimation procedure is summarized in Table 13.3, where
2 2
we report the estimates of σν and σµ obtained by retransforming the estimates
2 2
of log(σν ) and log(σµ )).

estimate std error t-value p-value
ˆ ˆ
ψ1 = φ1 0.783 0.501 1.56 0.12
ˆ ˆ
ψ2 = φ2 0.223 0.504 0.44 0.66
ˆ ˆ2
ψ1 = σν 0.0016 0.012 1.36 0.17
ˆ ˆ2
ψ2 = σµ 0.048 0.002 26.7 0
average log likelihood 0.9965

Table 13.3. Maximum likelihood estimates of the elements of ψ

Note that the maximum likelihood estimates of the AR coe¬cients φ1 and φ2
approximately sum to 1, again pointing towards a unit root process for the
common price component.

13.5.4 Diagnostic checking with standardized residuals

{V,Vs} = gkalresiduals(Y,Ta,Ra,da,Za,Ha,gkalfilOut)
calculates innovations and standardized residuals

The quantlet gkalresiduals checks internally for the positive de¬niteness of
Ft . An error message will be displayed when Ft is not positive de¬nite. In such
a case, the standardized residuals are not calculated.
13.5 Estimating and ¬ltering in XploRe 299

The output of the quantlet are two N — T matrices V and Vs. V contains the
innovations (13.10) and Vs contains the standardized residuals (13.11).
The Q-Q plot of the standardized residuals in Figure 13.2 shows deviations
from normality at both tails of the distribution.

Q-Q Plot of the standardized residuals


-5 0

Figure 13.2. Deviations of the dotted line from the straight line are
evidence for a nonnormal error distribution

This is evidence, that the true error distribution might be a unimodal dis-
tribution with heavier tails than the normal, such as the t-distribution. In
this case the projections calculated by the Kalman ¬lter no longer provide the
conditional expectations of the state vector but rather its best linear predic-
tion. Moreover the estimates of ψ calculated from the likelihood (13.9) can be
interpreted as pseudo-likelihood estimates.
300 13 A simple state space model of house prices

13.5.5 Calculating the Kalman smoother

gkalsmoothOut = gkalsmoother(Y,Ta,Ra,gkalfilOut)
provides Kalman smoothing of a time-varying SSF

The Kalman ¬lter is a convenient tool for calculating the conditional expecta-
tions and covariances of our SSF (13.4). We have used the innovations of this
¬ltering technique and its covariance matrix for calculating the log likelihood.
However, for estimating the unknown state vectors, we should use in every step
the whole sample information up to period T . For this task, we use the Kalman
The quantlet gkalsmoother needs as argument the output of gkalfilter. The
output of the smoother is an array with [at|T Pt|T ] matrices. This array of
dimension T + 1 starts with the t = 0 matrix and ends with the matrix for
t = T . For the smoother recursions, one needs at , Pt and Pt|t’1 for t = 1 . . . T .
Then the calculation procedure is as follows:
Start at t = T :
aT |T = aT
PT |T = PT
Step at t < T :
Pt— = Pt Tt+1 Pt+1|t
= at + Pt— (at+1|T ’ Tt+1 at )
= Pt + Pt— (Pt+1|T ’ Pt+1|t )Pt—

The next program calculates the smoothed state vectors for our SSF form,
given the estimated parameters ψ. The smoothed series of the common price
component is given in Figure 13.3. The con¬dence intervals are calculated
using the variance of the ¬rst element of the state vector.
Comparison with the average prices given in Figure 13.1 reveals that the com-
mon price component is less volatile than the simple average. Furthermore,
a table for the estimated hedonic coe¬cients”that is β”is generated, Table
Recall that these coe¬cients are just the last three entries in the state vector ±t .
According to our state space model, the variances for these state variables are
13.5 Estimating and ¬ltering in XploRe 301


. 10
( 14)