˜ˆ

’1

π(x) = V (x). The di¬erences between the two test statistics are (a) the

empirical likelihood test statistic automatically studentizes via its internal al-

gorithm conducted at the background, so that there is no need to explicitly

estimate V (x); (b) the empirical likelihood statistic is able to capture other

features such as skewness and kurtosis exhibited in the data without using the

bootstrap resampling which involves more technical details when data are de-

pendent. If we choose kn = [1/(2h)] as prescribed, then the remainder term in

(12.21) becomes Op {h log2 (n)}.

270 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

We will now discuss the asymptotic distribution of the test statistic n (mθ ).

˜ˆ

Theorem 12.3 was proven by Chen et al. (2001).

THEOREM 12.3 Suppose assumptions (i) - (vi), then

1

L

’1

N 2 (s)ds

kn n (mθ ) ’

˜ˆ

0

where N is a Gaussian process on [0, 1] with mean

E{N (s)} = h1/4 ∆n (s)/ V (s)

and covariance

(2)

f (s)σ 2 (s) W0 (s, t)

„¦(s, t) = Cov{N (s), N (t)} =

f (t)σ 2 (t) (2) (2)

W0 (s, s)W0 (t, t)

where

1

(2)

h’1 K{(s ’ y)/h}K{(t ’ y)/h}dy.

W0 (s, t) = (12.22)

0

As K is a compact kernel on [’1, 1], when both s and t are in SI (the interior

part of [0, 1]), we get from (12.22) with u = (s ’ y)/h

s

h

(2)

K(u)K{u ’ (s ’ t)/h}du

W0 (s, t) =

s’1

h

∞

K(u)K{u ’ (s ’ t)/h}du

=

’∞

s’t

= K (2) (12.23)

h

where K (2) is the convolution of K. The compactness of K also means that

(2)

W0 (s, t) = 0 if |s ’ t| > 2h which implies „¦(s, t) = 0 if |s ’ t| > 2h. Hence

N (s) and N (t) are independent if |s ’ t| > 2h. As

f (s)σ 2 (s) = f (s)σ 2 (t) + O(h)

when |s ’ t| ¤ 2h, we get

(2)

W0 (s, t)

+ O(h),

„¦(s, t) = (12.24)

(2) (2)

W0 (s, s)W0 (t, t)

12.6 Goodness-of-Fit Statistic 271

So, the leading order of the covariance function is free of σ 2 and f , i.e. „¦(s, t)

is completely known.

Let

h1/4 ∆n (s)

N0 (s) = N (s) ’ . (12.25)

V (s)

Then N0 (s) is a normal process with zero mean and covariance „¦. The bound-

1

(2)

edness of K implies W0 being bounded, and hence 0 „¦(t, t)dt < ∞. We will

def

1

N 2 (s)ds. Let T = T1 +T2 +T3 =

now study the expectation and variance of 0

1

N 2 (s)ds where

0

1

2

N0 (s)ds,

T1 =

0

1

V ’1/2 (s)∆n (s)N0 (s)ds

2h1/4

T2 = and

0

1

V ’1 (s)∆2 (s)ds.

= h1/2

T3 n

0

From some basic results on stochastic integrals, Lemma 12.2 and (12.24) fol-

lows,

1

E(T1 ) = „¦(s, s)ds = 1 and

0

2

= E[T1 ] ’ 1

Var(T1 ) (12.26)

1 1

2 2

E N0 (s)N0 (t) dsdt ’ 1

= (12.27)

0 0

1 1

„¦2 (s, t)dsdt

= 2

0 0

1 1

(2) (2) (2)

{W0 (s, t)}2 {W0 (s, s)W0 (t, t)}’1 dsdt {1 + O(h2 )}

= 2

0 0

From (12.23) and the fact that the size of the region [0, 1] \ SI,h is O(h), we

have

1 1

(2) (2) (2)

{W0 (s, t)}2 {W0 (s, s)W0 (t, t)}’1 dsdt

0 0

1 1

’2

(2)

[K (2) {(s ’ t)/h}]2 dsdt {1 + O(1)}

= {K (0)}

0 0

(0)}’2 + O(h).

= hK (4) (0){K (2)

272 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

Therefore,

Var(T1 ) = 2hK (4) (0){K (2) (0)}’2 + O(h2 ).

It is obvious that E(T2 ) = 0 and

V ’1/2 (s)∆n (s)„¦(s, t)V ’1/2 (t)∆n (t)dsdt.

4h1/2

Var(T2 ) =

As ∆n and V ’1 are bounded in [0, 1], there exists a constant C1 such that

Var(T2 ) ¤ C1 h1/2 „¦(s, t)dsdt.

Furthermore we know from the discussion above,

(2)

W0 (s, t)

dsdt + O(h)

„¦(s, t)dsdt =

(2) (2)

W0 (s, s)W0 (t, t)

(2)

t+2h

W0 (s, t)

dsdt + O(h)

=

K (2) (0)

t’2h

1

¤4 C1 h + C1 h

K (2) (0)

with other constants C and C1 , and thus, there exists a constant C2 , such

1

that

3

Var(T2 ) ¤ C2 h 2 .

As T3 is non-random, we have

1

V ’1 (s)∆2 (s)ds

1/2

E(T ) = 1+h and (12.28)

n

0

2hK (4) (0){K (2) (0)}’2 + O(h)

Var{T } = (12.29)

(12.28) and (12.29) together with Theorem 12.3 give the asymptotic expecta-

’1

tion and variance of the test statistic kn n (mθ ).

˜ˆ

12.7 Goodness-of-Fit test

We now turn our interest to the derivation of the asymptotic distribution of

1 kn

kn n (mθ ). We do this by discretizing 0 N 2 (s)ds as (kn )’1 j=1 N 2 (tj )

’1

˜ˆ

12.7 Goodness-of-Fit test 273

where {tj }kn are the mid-points of the original bins in formulating n (mθ ).

˜ˆ

j=1

’1

If we choose kn = [(2h) ] such that |tj+1 ’ tj | ≥ 2h for all j, then {N (tj )}

are independent and each N (tj ) ∼ N(h1/4 ∆n (tj )/ V (tj ), 1). This means that

under the alternative H1

kn

N 2 (tj ) ∼ χ2n (γkn ),

k

j=1

a non-central χ2 random variable with kn degree of freedom and the non-central

kn

component γkn = h1/4 { j=1 ∆2 (tj )/V (tj )}1/2 . Under H0 ,

n

kn

N 2 (tj ) ∼ χ2n

k

j=1

is χ2 -distributed with kn degrees of freedom. This leads to a χ2 test with

signi¬cance level ± which rejects H0 if n (mθ ) > χ2n ,± where χ2n ,± is the

˜ˆ k k

(1 ’ ±)-quantile of χkn . The asymptotic power of the χ2 test is P{χ2n (γkn ) >

2

k

2

χkn ,± }, which is sensitive to alternative hypotheses di¬ering from H0 in all

directions.

k

We may also establish the asymptotic normality of (kn )’1 i=1 N 2 (tj ) by ap-

n

plying the central limit theorem for a triangular array, which together with

(12.28) and (12.29) means that

L

’1

∆2 (s)V ’1 (s)ds, 2hK (4) (0){K (2) (0)}’2 .

’ N 1 + h1/2

kn n (mθ )

˜ˆ n

A test for H0 with an asymptotic signi¬cance level ± is to reject H0 if

’1

> 1 + z± {K (2) (0)}’1 2hK (4) (0)

kn n (mθ )

˜ˆ (12.30)

where P(Z > z± ) = ± and Z ∼ N(0, 1). The asymptotic power of this test is

K (2) (0) ∆2 (s)V ’1 (s)ds

n

1 ’ ¦ z± ’ . (12.31)

2K (4) (0)

We see from the above that the binning based on the bandwidth value h pro-

vides a key role in the derivation of the asymptotic distributions. However, the

binning discretizes the null hypothesis and unavoidably leads to some loss of

274 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

power as shown in the simulation reported in the next section. From the point

of view of retaining power, we would like to have the size of the bins smaller

than that prescribed by the smoothing bandwidth in order to increase the res-

olution of the discretized null hypothesis to the original H0 . However, this will

create dependence between the empirical likelihood evaluated at neighbouring

bins and make the above asymptotic distributions invalid. One possibility is

1 2

to evaluate the distribution of 0 N0 (s)ds by using the approach of Wood and

Chan (1994) by simulating the normal process N 2 (s) under H0 . However, this

is not our focus here and hence is not considered in this chapter.

12.8 Application

Figure 12.1 shows the daily closing value of the S&P 500 share index from

the 31st December 1976 to the 31st December 1997, which covers 5479 trading

days. In the upper panel, the index series shows a trend of exponential form

which is estimated using the method given in H¨rdle, Kleinow, Korostelev,

a

Logeay and Platen (2001). The lower panel is the residual series after removing

the exponential trend. In mathematical ¬nance one assumes often a speci¬c

dynamic form of this residual series, Platen (2000). More precisely, H¨rdle

a

et al. (2001) assume the following model for an index process S(t)

t

S(t) = S(0)X(t) exp ·(s)ds (12.32)

0

with a di¬usion component X(t) solving the stochastic di¬erential equation

dX(t) = a{1 ’ X(t)}dt + σX 1/2 (t)dW (t) (12.33)

where W (t) is a Brownian motion and ± and σ are parameters. Discretizing

this series with a sampling interval ∆ leads to the observations (Xi , Yi ) with

Yi = X(i+1)∆ ’ Xi∆ and Xi = Xi∆ , which will be ±-mixing and ful¬ll all the

other conditions assumed in Section 12.3.

We now apply the empirical likelihood test procedure on the S&P 500 data

presented in Figure 12.1 to test the parametric mean function m(x) = a(1 ’ x)

given in the Cox-Ingersoll-Ross di¬usion model (12.33). The process X is

restored from the observed residuals by the approach introduced in H¨rdlea

et al. (2001). The parametric estimate for a is a = 0.00968 by using methods

ˆ

based on the marginal distribution and the autocorrelation structure of X.

For details about the procedure see H¨rdle et al. (2001). The cross validation

a

12.8 Application 275

805.79

622.40

439.00

255.61

03.01.1977 01.01.1982 01.01.1987 01.01.1992 01.01.1997

0.2154

0.1129

0.0104

-0.0920

03.01.1977 01.01.1982 01.01.1987 01.01.1992 01.01.1997

Figure 12.1. The S&P 500 Data. The upper plot shows the S&P 500

together with the exponential trend. The lower plot shows the residual

process X.

is used to ¬nd the bandwidth h. However, the score function is monotonic

decreasing for h < 0.15 and then become a ¬‚at line for h ∈ [0.15, 0.8]. This

may be caused by the di¬erent intensity level of the design points. Further

investigation shows that a h-value larger (smaller) than 0.06 (0.02) produces an

oversmoothed (undersmoothed) curve estimate. Therefore, the test is carried

out for a set of h values ranging from 0.02 to 0.06. The P-values of the test as

a function of h are plotted in Figure 12.2.

276 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

0.8

0.6

P-value

0.4

0.2

0.03 0.04 0.05 0.06

Bandwidth h

Figure 12.2. The p-values of the S&P 500 Data

The P-values indicate that there is insu¬cient evidence to reject the di¬usion

model.

12.9 Simulation Study and Illustration

We investigate our testing procedure in two simulation studies. In our ¬rst

simulation we consider the time series model

2

Yi = 2Yi’1 /(1 + Yi’1 ) + cn sin(Yi’1 ) + σ(Yi’1 )·i

where {·i } are independent and identically distributed uniform random vari-

ables in [’1, 1], ·i is independent of Xi = Yi’1 for each i, and σ(x) =

exp(’x2 /4). Note that the mean and the variance functions are both bounded

which ensures the series is asymptotically stationary. To realize the station-

arity, we pre-run the series 100 times with an initial value Y’100 = 0. The

empirical likelihood test statistic is calculated via the elmtest quantlet.

12.9 Simulation Study and Illustration 277

{el,p,kn,h2} = elmtest(x,y,model{,kernel{,h{,theta}}})

calculates the empirical likelihood test statistic

The ¬rst and the second parameter are the vectors of observations of X and

Y . The third parameter model is the name of a quantlet that implements the

parametric model for the null hypothesis. The optimal parameter kernel is

the name of the kernel K that is used to calculate the test statistic and h is the

¯ ¯

bandwidth used to calculate U1 and U2 in (12.18). theta is directly forwarded

to the parametric model.

XFGelsim1.xpl

For the simulation study the sample sizes considered for each trajectory are

n = 500 and 1000 and cn , the degree of di¬erence between H0 and H1 , takes

value of 0, 0.03 and 0.06. As the simulation shows that the two empirical

likelihood tests have very similar power performance, we will report the results

for the test based on the χ2 distribution only. To gauge the e¬ect of the

smoothing bandwidth h on the power, ten levels of h are used for each simulated

sample to formulate the test statistic.

n = 500 n = 1000

0.6

cn = 0.06

0.5

cn = 0.06

0.3

power of the EL test

power of the EL test

0.4

0.3

0.2

0.2

cn = 0.03

cn = 0.03

0.1

0.1

cn = 0.00 cn = 0.00

0.2 0.4 0.6 0.8 0.2 0.4 0.6

bandwidth h bandwidth h

Figure 12.3. Power of the empirical likelihood test. The dotted lines

indicate the 5% level

278 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

Figure 12.3 presents the power of the empirical likelihood test based on 5000

simulation with a nominal 5% level of signi¬cance. We notice that when cn = 0

the simulated signi¬cance level of the test is very close to the nominal level for

large range of h values which is especially the case for the larger sample size

n = 1000. When cn increases, for each ¬xed h the power increases as the

distance between the null and the alternative hypotheses becomes larger. For

each ¬xed cn , there is a general trend of decreasing power when h increases.

This is due to the discretization of H0 by binning as discussed at the end of the

previous section. We also notice that the power curves for cn = 0.06 are a little

erratic although they maintain the same trend as in the case of cn = 0.03. This

may be due to the fact that when the di¬erence between H0 and H1 is large, the

di¬erence between the nonparametric and the parametric ¬ts becomes larger

and the test procedure becomes more sensitive to the bandwidths.

In our second simulation study we consider an Ornstein-Uhlenbeck process Z

¬‚uctuating about 0 that satis¬es the stochastic di¬erential equation

dZ(t) = aZ(t)dt + σdW (t)

where W is a standard Brownian Motion. The speed of adjustment parameter

a has to be negative to ensure stationarity. To apply the empirical likelihood

test we construct the time series X and Y as in Section 12.2, i.e.

= Z ∆ (ti ) ,

Xi X = (X1 , . . . , Xn )

= W (ti+1 ) ’ W (ti ) ,

µi µ = (µ1 , . . . , µn )

= Xi+1 ’ Xi = aXi ∆ + σµi ,

Yi Y = (Y1 , . . . , Yn ) (12.34)

It is well known that the transition probability of an Ornstein-Uhlenbeck pro-

cess is normal with conditional mean

E[Zt+∆ |Zt = x] = E[Xi+1 |Xi = x] = xea∆

and conditional variance

γ2

e’2β∆ ’ 1 .

Var(Zt+∆ |Zt = x) = Var(Xi+1 |Xi = x) =

’2β

To simulate the process we use the simou quantlet.

x = simou(n,a,s,delta)

simulates a discretely observed path of an Ornstein-Uhlenbeck

process via its transition probability law.

12.10 Appendix 279

The number of observations is given by n+1/, a is the speed of adjustment

parameter a, s is the di¬usion coe¬cient σ and delta is the time di¬erence ∆

between two observations.

The proposed simulation procedure and the Goodness-of-Fit test are illustrated

in XFGelsim2.xpl.

XFGelsim2.xpl

.

12.10 Appendix

LEMMA 12.2 Let X, Y be standard normal random variables with covariance

Cov(X, Y ) = ρ, i.e.

X 0 1ρ

∼N , . (12.37)

ρ1

Y 0

Then we have:

Cov(X 2 , Y 2 ) = 2ρ2

PROOF:

def

1 ’ ρ2 Z. Then we

De¬ne Z ∼ N(0, 1) independent of X and X = ρX +

get:

X 0 1ρ

∼N , .

ρ1

X 0

2

Cov(X 2 , Y 2 ) = Cov(X 2 , X ) = 2ρ2

Bibliography

Baggerly, K. A. (1998). Empirical likelihood as a goodness-of-¬t measure,

Biometrika 85: 535“547.

280 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

Bibby, B. M. and Sørensen, M. (1995). Martingale estimation functions for

discretely observed di¬usion processes, Bernoulli 1(1/2): 17 “ 40.

Billingsley, P. (1999). Convergence of Probability Measures, Wiley, New York.

Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes, Vol. 110 of

Lecture Notes in Statistics, Springer-Verlag, Heidelberg.

Chen, S. X., H¨rdle, W. and Kleinow, T. (2001). An empirical likelihood

a

goodness-of-¬t test for time series, Discussion paper 1, Sonderforschungs-

bereich 373, Humboldt-Universit¨t zu Berlin.

a

Genon-Catalot, V., Jeantheau, T. and Lar´do, C. (2000). Stochastic volatility

e

models as hidden markov models and statistical applications, Bernoulli

6(6).

H¨rdle, W. (1990). Applied Nonparametric Regression, number 19 in Econo-

a

metric Society Monographs, Cambridge University Press.

H¨rdle, W., Kleinow, T., Korostelev, A., Logeay, C. and Platen, E. (2001).

a

Semiparametric di¬usion estimation and application to a stock market

index, Discussion Paper 24, Sonderforschungsbereich 373, Humboldt-

Universit¨t zu Berlin.

a

H¨rdle, W. and Mammen, E. (1993). Comparing nonparametric versus para-

a

metric regression ¬ts, Ann. Statist. 21: 1926“1947.

H¨rdle, W., M¨ller, M., Sperlich, S. and Werwatz, A. (2000). Non- and semi-

a u

parametric modelling, XploRe e-book, www.xplore-stat.de.

Hart, J. D. (1997). Nonparametric smoothing and lack-of-¬t tests., Springer,

New York.

Kloeden, P. E. and Platen, E. (1999). Numerical Solution of Stochastic Di¬er-

ential Equations, Vol. 23 of Applications of Mathematics, Springer Verlag

Berlin Heidelberg.

Kreiß, J.-P., Neumann, M. and Yao, Q. (1998). Bootstrap tests for simple

structures in nonparametric time series regression. Discussion paper, Son-

derforschungsbereich 373.

Owen, A. (1988). Empirical likelihood ratio con¬dence intervals for a single

functional, Biometrika 75: 237“249.

12.10 Appendix 281

Owen, A. (1990). Empirical likelihood ratio con¬dence regions, Ann. Statist.

18: 90“120.

Owen, A. (1991). Empirical likelihood for linear model, Ann. Statist. 19: 1725“

1747.

Platen, E. (2000). Risk premia and ¬nancial modelling without measure trans-

formation. University of Technology Sydney, School of Finance & and

Economics and Department of Mathematical Sciences.

Wand, M. and Jones, M. (1995). Kernel Smoothing, number 60 in Monographs

in Statistics and Applied Probability, Chapman & Hall.

Wood, A. T. A. and Chan, G. (1994). Simulation of stationary gaussian process

in [0, 1]d , J. Comp. Graph. Stat. 3: 409“432.

13 A simple state space model of

house prices

Rainer Schulz and Axel Werwatz

13.1 Introduction

For most people, purchasing a house is a major decision. Once purchased,

the house will by far be the most important asset in the buyer™s portfolio.

The development of its price will have a major impact on the buyer™s wealth

over the life cycle. It will, for instance, a¬ect her ability to obtain credit

from commercial banks and therefore in¬‚uence her consumption and savings

decisions and opportunities. The behavior of house prices is therefore of central

interest for (potential) house buyers, sellers, developers of new houses, banks,

policy makers or, in short, the general public.

An important property of houses is that they are di¬erent from each other.

Hence, while houses in the same market (i.e., the same city, district or neigh-

borhood) will share some common movements in their price there will at all

times be idiosyncratic di¬erences due to di¬erences in maintenance, design or

furnishing. Thus, the average or median price will depend not only on the

general tendency of the market, but also on the composition of the sample. To

calculate a price index for real estate, one has to control explicitly for idiosyn-

cratic di¬erences. The hedonic approach is a popular method for estimating

the impact of the characteristics of heterogenous goods on their prices.

The statistical model used in this chapter tries to infer the common component

in the movement of prices of 1502 single-family homes sold in a district of Berlin,

Germany, between January 1980 and December 1999. It combines hedonic

regression with Kalman ¬ltering. The Kalman ¬lter is the standard statistical

tool for ¬ltering out an unobservable, common component from idiosyncratic,

284 13 A simple state space model of house prices

noisy observations. We will interpret the common price component as an index

of house prices in the respective district of Berlin. We assume that the index

follows an autoregressive process. Given this assumption, the model is writable

in state space form.

The remainder of this chapter is organized as follows. In the next section we

propose a statistical model of house prices and discuss its interpretation and

estimation. Section 13.4 introduces the data, while Section 13.5 describes the

quantlets used to estimate the statistical model. In this section we present also

the estimation results for our data. The ¬nal section gives a summary.

13.2 A Statistical Model of House Prices

13.2.1 The Price Function

The standard approach for constructing a model of the prices of heterogeneous

assets is hedonic regression (Bailey, Muth and Nourse, 1963; Hill, Knight and

Sirmans, 1997; Shiller, 1993). A hedonic model starts with the assumption

that on the average the observed price is given by some function f (It , Xn,t , β).

Here, It is a common price component that “drives” the prices of all houses, the

vector Xn,t comprises the characteristics of house n and the vector β contains

all coe¬cients of the functional form.

Most studies assume a log-log functional form and that It is just the constant

of the regression for every period (Clapp and Giaccotto, 1998; Cho, 1996). In

that case

pn,t = It + xn,t β + µn,t . (13.1)

Here, pn,t denotes the log of the transaction price. The vector xn,t contains the

transformed characteristics of house n that is sold in period t. The idiosyncratic

2

in¬‚uences µn,t are white noise with variance σµ .

Following Schwann (1998), we put some structure on the behavior of the com-

mon price component over time by assuming that the common price compo-

nent follows an autoregressive moving average (ARMA) process. For our data

it turns out that the following AR(2) process

It = φ1 It’1 + φ2 It’2 + νt (13.2)

with I0 = 0 su¬ces. This autoregressive speci¬cation re¬‚ects that the market

for owner-occupied houses reacts sluggish to changing conditions and that any

13.2 A Statistical Model of House Prices 285

price index will thus exhibit some autocorrelation. This time-series-based way

of modelling the behavior of It is more parsimonious than the conventional

hedonic regressions (which need to include a seperate dummy variable for each

time period) and makes forecasting straightforward.

13.2.2 State Space Form

We can rewrite our model (13.1) and (13.2) in State Space Form (SSF)

(Gourieroux and Monfort, 1997). In general, the SSF is given as:

±t = ct + Tt ±t’1 + µs (13.3a)

t

yt = dt + Zt ±t + µm (13.3b)

t

µs ∼ (0, Rt ) , µm ∼ (0, Ht ) . (13.3c)

t t

The notation partially follows Harvey (1989; 1993). The ¬rst equation is the

state equation and the second is the measurement equation. The characteristic

structure of state space models relates a series of unobserved values ±t to a

set of observations yt . The unobserved values ±t represent the behavior of the

system over time (Durbin and Koopman, 2001).

The unobservable state vector ±t has the dimension K 1, Tt is a square

matrix with dimension K — K, the vector of the observable variables yt has the

dimension Nt — 1. Here, Nt denotes the number of observations yt,n in period

t T . If the number of observations varies through periods, we denote

def

N= max Nt .

t=1,··· ,T

The matrix Zt contains constant parameters and other exogenous observable

variables. Finally, the vectors ct and dt contain some constants. The system

matrices ct , Tt , Rt , dt , Zt , and Ht may contain unknown parameters that have

to be estimated from the data.

In our model”that is (13.1) and (13.2)”, the common price component It and

the quality coe¬cients β are unobservable. However, whereas these coe¬cients

are constant through time, the price component evolves according to (13.2).

2

The parameters φ1 , φ2 , and σν of this process are unknown.

The observed log prices are the entries in yt of the measurement equation

and the characteristics are entries in Zt . In our data base we observe three

286 13 A simple state space model of house prices

characteristics per object. Furthermore, we include the constant β0 . We can

put (13.1) and (13.2) into SSF by setting

® ® ®

It φ1 1 0 0 0 0 νt

φ2 It’1 φ2 0 0 0 0 0 0

β0

, Tt = 0 0 1 0 0 0 s 0

±t = , µ = (13.4a)

0 t 0

β1 0 0 0 1 0

° β2 » °0 0 0 0 1 0» °0»

β3 0000 0 1 0

® ®

1 0 x1,t µ1,t

®

p1,t

. . . , µm = .

yt = ° . . . » , Zt = ° . . .» t °.» (13.4b)

. . . .

pNt ,t 1 0 xNt ,t µNt ,t

For our model, both ct and dt are zero vectors. The transition matrices Tt are

non time-varying. The variance matrices of the state equation Rt are identical

2

for all t and equal to a 6 — 6 matrix, where the ¬rst element is σν and all other

2

elements are zeros. Ht is a Nt — Nt diagonal matrix with σµ on the diagonal.

2

The variance σµ is also an unknown parameter.

The ¬rst two elements of the state equation just resemble the process of the

common price component given in (13.2). However, we should mention that

there are other ways to put an AR(2) process into a SSF (see Harvey, 1993, p.

84). The remaining elements of the state equation are the implicit prices β of

the hedonic price equation (13.1). Multiplying the state vector ±t with row n

of the matrix Zt gives It + xt,n β. This is just the functional relation (13.1) for

the log price without noise. The noise terms of (13.1) are collected in the SSF

in the vector µm . We assume that µm and µs are uncorrelated. This is required

t t t

for identi¬cation (Schwann, 1998, p. 274).

13.3 Estimation with Kalman Filter Techniques

13.3.1 Kalman Filtering given all parameters

def 2 2

Given the above SSF and all unknown parameters ψ = (φ1 , φ2 , σν , σµ ), we

can use Kalman ¬lter techniques to estimate the unknown coe¬cients β and

the process of It . The Kalman ¬lter technique is an algorithm for estimating

the unobservable state vectors by calculating its expectation conditional on

13.3 Estimation with Kalman Filter Techniques 287

information up to s T . In the ongoing, we use the following general notation:

def

at|s = E[±t |Fs ] (13.5a)

denotes the ¬ltered state vector and

def

Pt|s = E[(±t ’ at|s )(±t ’ at|s ) |Fs ] (13.5b)

denotes the covariance matrix of the estimation error and Fs is a shorthand

for the information available at time s.

Generally, the estimators delivered by Kalman ¬ltering techniques have min-

imum mean-squared error among all linear estimators (Shumway and Stof-

fer, 2000, Chapter 4.2). If the initial state vector, the noise µm and µs are

multivariate Gaussian, then the Kalman ¬lter delivers the optimal estimator

among all estimators, linear and nonlinear (Hamilton, 1994, Chapter 13).

The Kalman ¬lter techniques can handle missing observations in the measure-

ment equation (13.3b). For periods with less than N observations, one has to

adjust the measurement equations. One can do this by just deleting all elements

of the measurement matrices dt , Zt , Ht for which the corresponding entry in

yt is a missing value. The quantlets in XploRe use this procedure. Another

way to take missing values into account is proposed by Shumway and Sto¬er

(1982; 2000): replace all missing values with zeros and adjust the other mea-

surement matrices accordingly. We show in Appendix 13.6.1 that both methods

deliver the same results. For periods with no observations the Kalman ¬lter

techniques recursively calculate an estimate given recent information (Durbin

and Koopman, 2001).

13.3.2 Filtering and state smoothing

The Kalman ¬lter is an algorithm for sequently updating our knowledge of the

system given a new observation yt . It calculates one step predictions conditional

on s = t. Using our general expressions, we have

at = E[±t |Ft ]

and

Pt = E[(±t ’ at )(±t ’ at ) |Ft ] .

Here we use the standard simpli¬ed notation at and Pt for at|t and Pt|t . As a

by-product of the ¬lter, the recursions calculate also

at|t’1 = E[±t |Ft’1 ]

288 13 A simple state space model of house prices

and

Pt|t’1 = E[(±t ’ at|t’1 )(±t ’ at|t’1 ) |Ft’1 ] .

We give the ¬lter recursions in detail in Subsection 13.5.3.

The Kalman smoother is an algorithm to predict the state vector ±t given the

whole information up to T . Thus we have with our general notation s = T and

at|T = E[±t |FT ]

the corresponding covariance matrix

Pt|T = E[(±t ’ at|T )(±t ’ at|T ) |FT ] .

We see that the ¬lter makes one step predictions given the information up

to t ∈ {1, . . . , T } whereas the smoother is backward looking. We give the

smoother recursions in detail in Subsection 13.5.5.

13.3.3 Maximum likelihood estimation of the parameters

Given the system matrices ct , Tt , Rt , dt , Zt , and Ht , Kalman ¬ltering tech-

niques are the right tool to estimate the elements of the state vector. However,

in our model some of these system matrices contain unknown parameters ψ.

These parameters have to be estimated by maximum likelihood.

Given a multivariate Gaussian error distribution, the value of the log likelihood

function l(ψ) for a general SSF is up to an additive constant equal to:

T T

1 1

vt Ft’1 vt .

’ ln |Ft | ’ (13.9)

2 2

t=1 t=1

Here,

def

vt = yt ’ dt ’ Zt at|t’1 (13.10)

are the innovations of the ¬ltering procedure and at|t’1 is the conditional

expectation of ±t given information up to t ’ 1. As we have already mentioned,

these expressions are a by-product of the ¬lter recursions. The matrix Ft

is the covariance matrix of the innovations at time t and also a by-product

of the Kalman ¬lter. The above log likelihood is known as the prediction

error decomposition form (Harvey, 1989). Periods with no observations do not

contribute to the log likelihood function.

13.4 The Data 289

Starting with some initial value, one can use numerical maximization methods

to obtain an estimate of the parameter vector ψ. Under certain regularity con-

˜

ditions, the maximum likelihood estimator ψ is consistent and asymptotically

˜

normal. One can use the information matrix to calculate standard errors of ψ

(Hamilton, 1994).

13.3.4 Diagnostic checking

After ¬tting a SSF, one should check the appropriateness of the results by

looking at the standardized residuals

’1/2

st

vt = Ft vt . (13.11)

st

If all parameters of the SSF were known, vt would follow a multivariate stan-

dardized normal distribution (Harvey, 1989, see also (13.9)). We know that Ft

is a symmetric matrix and that it should be positive de¬nite (recall that it is

just the covariance matrix of the innovations vt ). So

’1/2 ’1/2

Ft = Ct Λt Ct , (13.12)

where the diagonal matrix Λt contains all eigenvalues of Ft and Ct is the ma-

trix of corresponding normalized eigenvectors (Greene, 2000, p.43). The stan-

dardized residuals should be distributed normally with constant variance, and

should show no serial correlation. It is a signal for a misspeci¬ed model when

the residuals do not possess these properties. To check the properties, one

can use standard test procedures. For example, a Q-Q plot indicates if the

quantiles of the residuals deviate from the corresponding theoretical quantiles

of a normal distribution. This plot can be used to detect non-normality. The

Jarque-Bera test for normality can also be used for testing non-normality of

the residuals (Bera and Jarque, 1982). This test is implemented in XploRe as

jarber.

In the empirical part, we combine Kalman ¬lter techniques and maximum

likelihood to estimate the unknown parameters and coe¬cients of the SSF for

the house prices in a district of Berlin.

13.4 The Data

The data set is provided by the Gutachterausschuß f¨r Grundst¨ckswerte in

u u

Berlin, an expert commission for Berlin™s real estate market. The commission

290 13 A simple state space model of house prices

collects information on all real estate transactions in Berlin in a data base called

Automatisierte Kaufpreissammlung.

Here, we use data for 1502 sales of detached single-family houses in a district

of Berlin for the years 1980 to 1999, stored in MD*BASE. Besides the price,

we observe the size of the lot, the ¬‚oor space, and the age of the house. The

data set XFGhouseprice contains the log price observations for all 80 quarters.

There are at most N = 43 observations in any quarter. The following lines of

XploRe code

Y = read("XFGhouseprice.dat")

Y[1:20,41:44]

can be used to take a look at the entries of XFGhouseprice. Every column

gives the observations for one quarter. Thus, in columns 41 to 44 we ¬nd the

observations for all quarters of 1990. If less than 43 transactions are observed

in a quarter the remaining entries are ¬lled with the missing value code NaN.

Only in the ¬rst quarter of the year 1983 we observe 43 transactions.

The corresponding data set XFGhousequality contains the observed charac-

teristics of all houses sold. They are ordered in the following way: each column

contains all observations for a given quarter. Remember that for every house

we observe log size of the lot, log size of the ¬‚oor space and age. The ¬rst

three rows of a column refer to the ¬rst house in t, the next three to the second

house and so on.

Let us look at the characteristics of the ¬rst two observations in 1990:1. Just

type the following lines in the XploRe input window

X = read("XFGhousequality.dat")

X[1:6,41]™

After compiling, you get the output

[1,] 6.1048 4.7707 53 6.5596 5.1475 13

The size of the lot for the second house is about 706 square meters (just take

the antilog). The size of the ¬‚oor space is 172 square meters and the age is 13

years.

13.4 The Data 291

The following table shows summary statistics of our Berlin house price data.

"========================================================="

" Summary statistics for the Berlin house price data "

"========================================================="

" Sample for 80 quarters with 1502 observations "

" "

" Observations per period "

" ----------------------------------------------------"

" Minimum = 4 Average = 18.77 Maximum = 43 "

" "

" Transaction prices (in thousand DM) "

" ----------------------------------------------------"

" Minimum = 100.00 Average = 508.46 "

" Maximum = 1750.01 Std. Dev. = 197.92 "

" "

" Size of the lot (in square meters) "

" ----------------------------------------------------"

" Minimum = 168.00 Average = 626.18 "

" Maximum = 2940.00 Std. Dev. = 241.64 "

" "

" Size of the floor space (in square meters) "

" ----------------------------------------------------"

" Minimum = 46.00 Average = 144.76 "

" Maximum = 635.00 Std. Dev. = 48.72 "

" "

" Age of the building (in years) "

" ----------------------------------------------------"

" Minimum = 0 Average = 28.59 "

" Maximum = 193 Std. Dev. = 21.58 "

"========================================================="

XFGsssm1.xpl

Not surprisingly for detached houses there are large di¬erences in the size of

the lot. Some houses were new in the period of the sale while one was 193

years old. That is a good example for the potential bias of the average price

per quarter as a price index. If we do not control explicitly for depreciation we

might obtain a low price level simply because the houses sold in a quarter were

old.

292 13 A simple state space model of house prices

Nevertheless, the average price per quarter can give an indication of the price

level. Figure 13.1 shows the average price per quarter along with con¬dence

intervals at the 90% level. Instead of the average price, we could also calculate

an average adjusted price, where the most important characteristic is used for

the adjustment. Such adjustment is attained by dividing the price of every

house by”for example”the respective size of the lot. However, even in that

case we would control only for one of the observed characteristics. In our model

we will control for all of the observed characteristics.

1100.00

900.00

700.00

500.00

300.00

1980:1 1985:1 1990:1 1995:1 1999:4

Figure 13.1. Average price per quarter, units are Deutsche Mark (1

DM ≈ 0.511 EURO). Con¬dence intervals are calculated for the 90%

level.

XFGsssm2.xpl

13.5 Estimating and ¬ltering in XploRe 293

13.5 Estimating and ¬ltering in XploRe

13.5.1 Overview

The procedure for Kalman ¬ltering in XploRe is as follows: ¬rst, one has

to set up the system matrices using gkalarray. The quantlet adjusts the

measurement matrices for missing observations.

After the set up of the system matrices, we calculate the Kalman ¬lter with

gkalfilter. This quantlet also calculates the value of the log likelihood

function given in equation (13.9). That value will be used to estimate the

unknown parameters of the system matrices with numerical maximization

(Hamilton, 1994, Chapter 5). The ¬rst and second derivatives of the log like-

lihood function will also be calculated numerically. To estimate the unknown

state vectors”given the estimated parameters”we use the Kalman smoother

gkalsmoother. For diagnostic checking, we use the standardized residuals

(13.11). The quantlet gkalresiduals calculates these residuals.

13.5.2 Setting the system matrices

gkalarrayOut = gkalarray(Y,M,IM,XM)

sets the system matrices for a time varying SSF

The Kalman ¬lter quantlets need as arguments arrays consisting of the system

matrices. The quantlet gkalarray sets these arrays in a user-friendly way. The

routine is especially convenient if one works with time varying system matrices.

In our SSF (13.4), only the system matrix Zt is time varying. As one can see

immediately from the general SSF (13.3), possibly every system matrix can be

time varying.

The quantlet uses a three step procedure to set up the system matrices.

1. To de¬ne a system matrix all constant entries must be set to their re-

spective values and all time varying entries must be set to an arbitrary

number (for example to 0).

2. One must de¬ne an index matrix for every system matrix. An entry is

set to 0 when its corresponding element in the system matrix is constant

and to some positive integer when it is not constant.

294 13 A simple state space model of house prices

3. In addition, for every time varying system matrix, one also has to specify

a data matrix that contains the time varying entries.

gkalarray uses the following notation: Y denotes the matrix of all observations

[y1 , . . . , yT ], M denotes the system matrix, IM denotes the corresponding index

matrix and XM the data matrix.

If all entries of a system matrix are constant over time, then the parameters

have already been put directly into the system matrix. In this case, one should

set the index and the data matrix to 0.

For every time varying system matrix, only constant parameters”if there are

any”have already been speci¬ed with the system matrix. The time-varying

coe¬cients have to be speci¬ed in the index and the data matrix.

In our example, only the matrices Zt are time varying. We have

®

1 01000

. . . . . .

def

Z = °. . . . . .»

. .....

1 0 1 0 0 0

®

0 0 0 1 2 3

0 0 0 4 5 6

def

=

IZ

. . . . . .

°. . . . . .

. . . . . . »

0 0 0 (3N + 1) (3N + 2) (3N + 3)

def

=

XZ XFGhousequality

The system matrix Zt has the dimension (N — 6). The non-zero entries in the

index matrix IZ prescribe the rows of XFGhousequality, which contain the

time varying elements.

The output of the quantlet is an array that stacks the system matrices one

after the other. For example, the ¬rst two rows of the system matrix Z41 are

[1,] 1 0 1 6.1048 4.7707 53

[2,] 1 0 1 6.5596 5.1475 13

XFGsssm3.xpl

It is easy to check that the entries in the last three columns are just the char-

acteristics of the ¬rst two houses that were sold in 1990:1 (see p. 290).

13.5 Estimating and ¬ltering in XploRe 295

13.5.3 Kalman ¬lter and maximized log likelihood

{gkalfilOut,loglike} = gkalfilter(Y,mu,Sig,ca,Ta,Ra,

da,Za,Ha,l)

Kalman ¬lters a time-varying SSF

We assume that the initial state vector at t = 0 has mean µ and covariance

matrix Σ. Recall, that Rt and Ht denote the covariance matrix of the state noise

and”respectively”of the measurement noise. The general ¬lter recursions are

as follows:

Start at t = 1: use the initial guess for µ and Σ to calculate

a1|0 = c1 + T1 µ

P1|0 = T1 ΣT1 + R1

F1 = Z1 P1|0 Z1 + H1

and

’1

= a1|0 + P1|0 Z1 F1 (y1 ’ Z1 a1|0 ’ d1 )

a1

’1

= P1|0 ’ P1|0 Z1 F1 Z1 P1|0

P1

Step at t T : using at’1 and Pt’1 from the previous step, calculate

at|t’1 = ct + Tt at’1

Pt|t’1 = Tt Pt’1 Tt + Rt

Ft = Zt Pt|t’1 Zt + Ht

and

= at|t’1 + Pt|t’1 Zt Ft’1 (yt ’ Zt at|t’1 ’ dt )

at

= Pt|t’1 ’ Pt|t’1 Zt Ft’1 Zt Pt|t’1

Pt

The implementation for our model is as follows: The arguments of gkalfilter

are the data matrix Y, the starting values mu (µ), Sig (Σ) and the array for

every system matrix (see section 13.5.2). The output is a T + 1 dimensional

296 13 A simple state space model of house prices

array of [at Pt ] matrices. If one chooses l = 1 the value of the log likelihood

function (13.9) is calculated.

Once again, the T + 1 matrices are stacked “behind each other”, with the t = 0

matrix at the front and the t = T matrix at the end of the array. The ¬rst

entry is [µ Σ].

How can we provide initial values for the ¬ltering procedure? If the state

matrices are non time-varying and the transition matrix T satis¬es some sta-

bility condition, we should set the initial values to the unconditional mean and

variance of the state vector. Σ is given implicitly by

vec(Σ) = (I ’ T — T )’1 vec(R) .

Here, vec denotes the vec-operator that places the columns of a matrix below

each other and — denotes the Kronecker product. Our model is time-invariant.

But does our transition matrix ful¬ll the stability condition? The necessary and

su¬cient condition for stability is that the characteristic roots of the transition

matrix T should have modulus less than one (Harvey, 1989, p. 114). It is easy

to check that the characteristic roots »j of our transition matrix (13.4a) are

given as

φ1 ± φ2 + 4φ2

1

»1,2 = .

2

For example, if φ1 and φ2 are both positive, then φ1 + φ2 < 1 guarantees real

characteristic roots that are smaller than one (Baumol, 1959, p. 221). However,

when the AR(2) process of the common price component It has a unit root,

the stability conditions are not ful¬lled. If we inspect Figure 13.1, a unit root

seems quite plausible. Thus we can not use this method to derive the initial

values.

If we have some preliminary estimates of µ, along with preliminary measures of

uncertainty”that is a estimate of Σ”we can use these preliminary estimates

as initial values. A standard way to derive such preliminary estimates is to

use OLS. If we have no information at all, we must take di¬use priors about

the initial conditions. A method adopted by Koopman, Shephard and Doornik

(1999) is setting µ = 0 and Σ = κI where κ is an large number. The large

variances on the diagonal of Σ re¬‚ect our uncertainty about the true µ.

We will use the second approach for providing some preliminary estimates as

initial values. Given the hedonic equation (13.1), we use OLS to estimate It ,

2

β, and σm by regressing log prices on lot size, ¬‚oor space, age and quarterly

time dummies. The estimated coe¬cients of lot size, ¬‚oor space and age are

13.5 Estimating and ¬ltering in XploRe 297

coe¬cient t-statistic p-value

log lot size 0.2675 15.10 0.0000

log ¬‚oor space 0.4671 23.94 0.0000

age -0.0061 -20.84 0.0000

Regression diagnostics

R2 0.9997 Number of observations 1502

2

R 0.9997 F-statistic 64021.67

ˆ2

σµ 0.4688 Prob(F-statistic) 0.0000

Table 13.1. Results for hedonic regression

reported in Table 13.1. They are highly signi¬cant and reasonable in sign and

magnitude. Whereas lot size and ¬‚oor space increase the price on average, age

has the opposite e¬ect. According to (13.1), the common price component It

is a time-varying constant term and is therefore estimated by the coe¬cients

ˆ

of the quarterly time dummies, denoted by {It }80 . As suggested by (13.2),

t=1

these estimates are regressed on their lagged values to obtain estimates of the

2

unknown parameters φ1 , φ2 , and σs . Table 13.2 presents the results for an

ˆ

AR(2) for the It series. The residuals of this regression behave like white noise.

coe¬cient t-statistic p-value

constant 0.5056 1.3350 0.1859

ˆ

It’1 0.4643 4.4548 0.0000

ˆ

It’2 0.4823 4.6813 0.0000

Regression diagnostics

R2 0.8780 Number of observations 78

2

R 0.8747 F-statistic 269.81

ˆ2

σν 0.0063 Prob(F-statistic) 0.0000

Table 13.2. Time series regression for the quarterly dummies

298 13 A simple state space model of house prices

We should remark that

ˆ ˆ

φ1 + φ2 ≈ 1

and thus the process of the common price component seems to have a unit

root.

Given our initial values we maximize the log likelihood (13.9) numerically with

def

respect to the elements of ψ — = (φ1 , φ2 , log(σν ), log(σµ )). Note that ψ — di¬ers

2 2

2 2

from ψ by using the logarithm of the variances σν and σµ . This transformation is

known to improve the numerical stability of the maximization algorithm, which

employs nmBFGS of XploRe™s nummath library. Standard errors are computed

from inverting the Hessian matrix provided by nmhessian. The output of the

maximum likelihood estimation procedure is summarized in Table 13.3, where

2 2

we report the estimates of σν and σµ obtained by retransforming the estimates

2 2

of log(σν ) and log(σµ )).

estimate std error t-value p-value

ˆ ˆ

ψ1 = φ1 0.783 0.501 1.56 0.12

ˆ ˆ

ψ2 = φ2 0.223 0.504 0.44 0.66

ˆ ˆ2

ψ1 = σν 0.0016 0.012 1.36 0.17

ˆ ˆ2

ψ2 = σµ 0.048 0.002 26.7 0

average log likelihood 0.9965

Table 13.3. Maximum likelihood estimates of the elements of ψ

XFGsssm4.xpl

Note that the maximum likelihood estimates of the AR coe¬cients φ1 and φ2

approximately sum to 1, again pointing towards a unit root process for the

common price component.

13.5.4 Diagnostic checking with standardized residuals

{V,Vs} = gkalresiduals(Y,Ta,Ra,da,Za,Ha,gkalfilOut)

calculates innovations and standardized residuals

The quantlet gkalresiduals checks internally for the positive de¬niteness of

Ft . An error message will be displayed when Ft is not positive de¬nite. In such

a case, the standardized residuals are not calculated.

13.5 Estimating and ¬ltering in XploRe 299

The output of the quantlet are two N — T matrices V and Vs. V contains the

innovations (13.10) and Vs contains the standardized residuals (13.11).

The Q-Q plot of the standardized residuals in Figure 13.2 shows deviations

from normality at both tails of the distribution.

Q-Q Plot of the standardized residuals

0

Y

-5

-5 0

X

Figure 13.2. Deviations of the dotted line from the straight line are

evidence for a nonnormal error distribution

XFGsssm5.xpl

This is evidence, that the true error distribution might be a unimodal dis-

tribution with heavier tails than the normal, such as the t-distribution. In

this case the projections calculated by the Kalman ¬lter no longer provide the

conditional expectations of the state vector but rather its best linear predic-

tion. Moreover the estimates of ψ calculated from the likelihood (13.9) can be

interpreted as pseudo-likelihood estimates.

300 13 A simple state space model of house prices

13.5.5 Calculating the Kalman smoother

gkalsmoothOut = gkalsmoother(Y,Ta,Ra,gkalfilOut)

provides Kalman smoothing of a time-varying SSF

The Kalman ¬lter is a convenient tool for calculating the conditional expecta-

tions and covariances of our SSF (13.4). We have used the innovations of this

¬ltering technique and its covariance matrix for calculating the log likelihood.

However, for estimating the unknown state vectors, we should use in every step

the whole sample information up to period T . For this task, we use the Kalman

smoother.

The quantlet gkalsmoother needs as argument the output of gkalfilter. The

output of the smoother is an array with [at|T Pt|T ] matrices. This array of

dimension T + 1 starts with the t = 0 matrix and ends with the matrix for

t = T . For the smoother recursions, one needs at , Pt and Pt|t’1 for t = 1 . . . T .

Then the calculation procedure is as follows:

Start at t = T :

aT |T = aT

PT |T = PT

Step at t < T :

’1

Pt— = Pt Tt+1 Pt+1|t

= at + Pt— (at+1|T ’ Tt+1 at )

at|T

= Pt + Pt— (Pt+1|T ’ Pt+1|t )Pt—

Pt|T

The next program calculates the smoothed state vectors for our SSF form,

˜

given the estimated parameters ψ. The smoothed series of the common price

component is given in Figure 13.3. The con¬dence intervals are calculated

using the variance of the ¬rst element of the state vector.

Comparison with the average prices given in Figure 13.1 reveals that the com-

mon price component is less volatile than the simple average. Furthermore,

a table for the estimated hedonic coe¬cients”that is β”is generated, Table

13.4.

Recall that these coe¬cients are just the last three entries in the state vector ±t .

According to our state space model, the variances for these state variables are

13.5 Estimating and ¬ltering in XploRe 301