consider a combination of two one-sided schemes, Lucas (1976) or Lucas and

Crosier (1982), and a scheme based on Crosier (1986). Note, that in some

recent papers the same concept of combination of two one-sided schemes is

used for EWMA charts.

Recall, that Shewhart charts are a special case of EWMA schemes (» = 1).

Therefore, we distinguish 5 SPC schemes “ ewma1, ewma2, cusum1, cusum2

(two one-sided schemes), and cusumC (Crosier™s scheme). For the two-sided

EWMA charts the following quantlets are provided in the XploRe quantlib

spc.

SPC quantlets for two-sided EWMA scheme

“ produces chart ¬gure

spcewma2

spcewma2arl “ returns ARL

“ returns critical value c2

spcewma2c

spcewma2ad “ returns AD (steady-state ARL)

spcewma2pmf “ returns probability mass and distribution function

of the run-length for single time points

spcewma2pmfm “ the same up to a given time point

240 11 Statistical Process Control

By replacing ewma2 by one of the remaining four scheme titles the related

characteristics can be computed.

The quantlets spcewma1,...,spccusumC generate the chart ¬gure. Here, we ap-

ply the 5 charts to arti¬cial data. 100 pseudo random values from a normal

distribution are generated. The ¬rst 80 values have expectation 0, the next 20

values have expectation 1, i. e. model (11.1) with µ0 = 0, µ1 = 1, and m = 81.

We start with the two-sided EWMA scheme and set » = 0.1, i. e. the chart is

Two-sided EWMA chart

lambda = 0.10, in-control ARL = 300 94

1

0.5

Z_t

0

-0.5

0 50 100

t

Figure 11.1. Two-sided EWMA chart XFGewma2fig.xpl

very sensitive to small changes. The critical value c2 (see (11.3)) is computed

to provide an in-control ARL of 300 (see Section 11.2). Thus, the scheme leads

in average after 300 observations to a false alarm.

EWMA

In Figure 11.1 the graph of Zt is plotted against time t = 1, 2, . . . , 100.

Further, the design parameter », the in-control ARL, and the time of alarm (if

there is one) are printed. One can see, that the above EWMA scheme detects

the change point m = 81 at time point 94, i. e. the delay is equal to 14. The

related average values, i. e. ARL and Average Delay (AD), for µ1 = 1 are 9.33

and 9.13, respectively. Thus, the scheme needs here about 5 observations more

than average.

11.1 Control Charts 241

In the same way the remaining four SPC schemes can be plotted. Remark,

that in case of ewma1 one further parameter has to be set. In order to obtain

EWMA

a suitable ¬gure and an appropriate scheme the EWMA statistic Zt (see

(11.4)) is re¬‚ected at a pre-speci¬ed border zreflect ¤ 0 (= µ0 ), i. e.

EWMA EWMA

} , t = 1, 2, . . .

Zt = max{zreflect, Zt

for an upper EWMA scheme. Otherwise, the statistic is unbounded, which

leads to schemes with poor worst case performance. Further, the methods

One-sided EWMA chart

1

lambda = 0.10, in-control ARL = 300, zreflect = -4.0 93

0.5

Z_t

0

-0.5

0 50 100

t

Figure 11.2. One-sided EWMA chart XFGewma1fig.xpl

used in Section 11.2 for computing the chart characteristics use bounded con-

tinuation regions of the chart. If zreflect is small enough, then the ARL and

the AD (which are not worst case criterions) of the re¬‚ected scheme are the

same as of the unbounded scheme. Applying the quantlet XFGewma1fig.xpl

with zreflect= ’4 leads to Figure 11.2. Thereby, zreflect has the same

normalization factor »/(2 ’ ») like the critical value c2 (see 2.). The corre-

sponding normalized border is printed as dotted line (see Figure 11.2). The

chart signals one observation earlier than the two-sided version in Figure 11.1.

The related ARL and AD values for µ1 = 1 are now 7.88 and 7.87, respectively.

In Figure 11.3 the three di¬erent CUSUM charts with k = 0.5 are presented.

They signal at the time points 87, 88, and 88 for cusum1, cusum2, and cusumC,

242 11 Statistical Process Control

respectively. For the considered dataset the CUSUM charts are faster be-

One-sided CUSUM chart

k = 0.50, in-control ARL = 300 87

10

Z_t

5

0

0 50 100

t

XFGcusum1fig.xpl

Two-sided CUSUM chart

k = 0.50, in-control ARL = 300 88

10

5

Z_t

0

-5

0 50 100

t

XFGcusum2fig.xpl

Crosier™s two-sided CUSUM chart

k = 0.50, in-control ARL = 300 88

10

5

Z_t

0

0 50 100

t

XFGcusumCfig.xpl

Figure 11.3. CUSUM charts: one-sided, two-sided, Crosier™s two-sided

11.2 Chart characteristics 243

cause of their better worst case performance. The observations right before

the change point at m = 81 are smaller than average. Therefore, the EWMA

charts need more time to react to the increased average. The related average

values of the run length, i. e. ARL and AD, are 8.17 and 7.52, 9.52 and 8.82,

9.03 and 8.79 for cusum1, cusum2, and cusumC, respectively.

11.2 Chart characteristics

Consider the change point model (11.1). For ¬xed m denote Pm (·) and Em (·)

the corresponding probability measure and expectation, respectively. Hereby,

m = ∞ stands for the case of no change, i. e. the so called in-control case. Then

the Average Run Length (ARL) (expectation of the run length L) is de¬ned

as

E∞ (L) , µ = µ0

Lµ = . (11.7)

E1 (L) , µ = µ0

Thus, the ARL denotes the average number of observations until signal for a

sequence with constant expectation. µ = µ0 or m = ∞ stands for no change,

µ = µ0 and m = 1 mark, that just at the ¬rst time point (or earlier) a change

takes place from µ0 to µ. Therefore, the ARL evaluates only the special sce-

nario of m = 1 of the SPC scheme. Other measures, which take into account

that usually 1 < m < ∞, were introduced by Lorden (1971) and Pollak and

Siegmund (1975), Pollak and Siegmund (1975). Here, we use a performance

measure which was ¬rstly proposed by Roberts (1959). The so called (condi-

tional) Average Delay (AD, also known as steady-state ARL) is de¬ned as

(m)

Dµ lim Dµ ,

= (11.8)

m’∞

(m)

Dµ = Em L ’ m + 1|L ≥ m ,

where µ is the value of µ1 in (11.1), i. e. the expectation after the change.

While Lµ measures the delay for the case m = 1, Dµ determines the delay for

a SPC scheme which ran a long time without signal. Usually, the convergence

(m)

in (11.8) is very fast. For quite small m the di¬erence between Dµ and Dµ is

very small already. Lµ and Dµ are average values for the random variable L.

Unfortunately, L is characterized by a large standard deviation. Therefore, one

might be interested in the whole distribution of L. Again, we restrict on the

special cases m = 1 and m = ∞. We consider the probability mass function

Pµ (L = n) (PMF) and the cumulative distribution function Pµ (L ¤ n) (CDF).

Based on the CDF, one is able to compute quantiles of the run length L.

244 11 Statistical Process Control

For normally distributed random variables it is not possible to derive exact

solutions for the above characteristics. There are a couple of approximation

techniques. Besides very rough approximations based on the Wald approxi-

mation known from sequential analysis, Wiener process approximations and

similar methods, three main methods can be distinguished:

1. Markov chain approach due to Brook and Evans (1972): Replacement of

the continuous statistic Zt by a discrete one

2. Quadrature of integral equations which are derived for the ARL, Vance

(1986) and Crowder (1986) and for some eigenfunctions which lead to the

AD

3. Waldmann (1986) approach: Iterative computation of P (L = n) by us-

ing quadrature and exploiting of monotone bounds for the considered

characteristics

Here we use the ¬rst approach, which has the advantage, that all considered

characteristics can be presented in a straightforward way. Next, the Markov

chain approach is brie¬‚y described. Roughly speaking, the continuous statistic

Zt is approximated by a discrete Markov chain Mt . The transition Zt’1 =

x ’ Zt = y is approximated by the transition Mt’1 = i w ’ Mt = j w with

x ∈ [i w ’ w/2, i w + w/2] and y ∈ [j w ’ w/2, j w + w/2]. That is, given an

integer r the continuation region of the scheme [’c, c], [zreflect, c], or [0, c]

is separated into 2 r + 1 or r + 1 intervals of the kind [i w ’ w/2, i w + w/2]

(one exception is [0, w/2] as the ¬rst subinterval of [0, c]). Then, the transition

kernel f of Zt is approximated by the discrete kernel of Mt , i. e.

f (x, y) ≈ P (i w ’ [j w ’ w/2, j w + w/2])/w

for all x ∈ [i w ’ w/2, i w + w/2] and y ∈ [j w ’ w/2, j w + w/2]. Eventually,

we obtain a Markov chain {Mt } with 2 r + 1 or r + 1 transient states and one

absorbing state. The last one corresponds to the alarm (signal) of the scheme.

Denote by Q = (qij ) the matrix of transition probabilities of the Markov chain

{Mt } on the transient states, 1 a vector of ones, and L = (Li ) the ARL vector.

Li stands for the ARL of a SPC scheme which starts in point i w (corresponds

to z0 ). In the case of a one-sided CUSUM scheme with z0 = 0 [0, w/2]

the value L0 approximates the original ARL. By using L we generalize the

original schemes to schemes with possibly di¬erent starting values z0 . Now,

the following linear equation system is valid, Brook and Evans (1972):

(I ’ Q) L = 1 , (11.9)

11.2 Chart characteristics 245

where I denotes the identity matrix. By solving this equation system we get

the ARL vector L and an approximation of the ARL of the considered SPC

scheme. Remark that the larger r the better is the approximation. In the days

of Brook and Evans (1972) the maximal matrix dimension r+1 (they considered

cusum1) was 15 because of the restrictions of the available computing facilities.

Nowadays, one can use dimensions larger than some hundreds. By looking

at di¬erent r one can ¬nd a suitable value. The quantlet XFGrarl.xpl

demonstrates this e¬ect for the Brook and Evans (1972) example. 9 di¬erent

values of r from 5 to 500 are used to approximate the in-control ARL of a

one-sided CUSUM chart with k = 0.5 and c3 = 3 (variance σ 2 = 1). We get

r 5 10 20 30 40 50 100 200 500

L0 113.47 116.63 117.36 117.49 117.54 117.56 117.59 117.59 117.60

XFGrarl.xpl

The true value is 117.59570 (obtainable via a very large r or by using the

quadrature methods with a suitable large number of abscissas). The computa-

tion of the average delay (AD) requires more extensive calculations. For details

see, e. g., Knoth (1998) on CUSUM for Erlang distributed data. Here we apply

the Markov chain approach again, Crosier (1986). Given one of the considered

schemes and normally distributed data, the matrix Q is primitive, i. e. there

exists a power of Q which is positive. Then Q has one single eigenvalue which

is larger in magnitude than the remaining eigenvalues. Denote this eigenvalue

by . The corresponding left eigenvector ψ is strictly positive, i. e.

ψQ = ψ , ψ > 0. (11.10)

It can be shown, Knoth (1998), that the conditional density f (·|L ≥ m) of

both the continuous statistic Zt and the Markov chain Mt tends for m ’

∞ to the normalized left eigenfunction and eigenvector, respectively, which

correspond to the dominant eigenvalue . Therefore, the approximation of

D = lim Em (L ’ m + 1|L ≥ m) can be constructed by

m’∞

D = (ψ T L)/(ψ T 1) .

Note, that the left eigenvector ψ is computed for the in-control mean µ0 , while

the ARL vector L is computed for a speci¬c out-of-control mean or µ0 again.

If we replace in the above quantlet ( XFGrarl.xpl) the phrase arl by ad, then

246 11 Statistical Process Control

we obtain the following output which demonstrates the e¬ect of the parameter

r again.

r 5 10 20 30 40 50 100 200 500

D0 110.87 114.00 114.72 114.85 114.90 114.92 114.94 114.95 114.95

XFGrad.xpl

Fortunately, for smaller values of r than in the ARL case we get good accuracy

already. Note, that in case of cusum2 the value r has to be smaller (less than 30)

than for the other charts, since it is based on the computation of the dominant

eigenvalue of a very large matrix. The approximation in case of combination of

two one-sided schemes needs a twodimensional approximating Markov chain.

For the ARL only exists a more suitable approach. As, e. g., Lucas and Crosier

(1982) shown it is possible to use the following relation between the ARLs of the

one- and the two-sided schemes. Here, the two-sided scheme is a combination

of two symmetric one-sided schemes which both start at z0 = 0. Therefore,

we get a very simple formula for the ARL L of the two-sided scheme and the

ARLs Lupper and Llower of the upper and lower one-sided CUSUM scheme

Lupper · Llower

L= . (11.11)

Lupper + Llower

Eventually, we consider the distribution function of the run length L itself.

By using the Markov chain approach and denoting with pn the approximated

i

probability of (L > n) for a SPC scheme started in i w, such that pn = (pn ),

i

we obtain

pn = pn’1 Q = p0 Qn . (11.12)

The vector p0 is initialized with p0 = 1 for the starting point z0 ∈ [i w ’

i

0

w/2, i w + w/2] and pj = 0 otherwise. For large n we can replace the above

equation by

pn ≈ gi n . (11.13)

i

The constant gi is de¬ned as

gi = φi /(φT ψ) ,

where φ denotes the right eigenvector of Q, i. e. Q φ = φ. Based on (11.12)

and (11.13) the probability mass and the cumulative distribution function of

11.2 Chart characteristics 247

the run length L can be approximated. (11.12) is used up to a certain n. If the

di¬erence between (11.12) and (11.13) is smaller than 10’9 , then exclusively

(11.13) is exploited. Remark, that the same is valid as for the AD. For the

two-sided CUSUM scheme (cusum2) the parameter r has to be small (¤ 30).

11.2.1 Average Run Length and Critical Values

The spc quantlib provides the quantlets spcewma1arl,...,spccusumCarl for

computing the ARL of the corresponding SPC scheme. All routines need the

actual value of µ as a scalar or as a vector of several µ, two scheme param-

eters, and the integer r (see the beginning of the section). The XploRe ex-

ample XFGarl.xpl demonstrates all ...arl routines for k = 0.5, » = 0.1,

zreflect= ’4, r = 50, c = 3, in-control and out-of-control means µ0 = 0 and

µ1 = 1, respectively. The next table summarizes the ARL results

chart ewma1 ewma2 cusum1 cusum2 cusumC

L0 1694.0 838.30 117.56 58.780 76.748

L1 11.386 11.386 6.4044 6.4036 6.4716

XFGarl.xpl

Remember that the ARL of the two-sided CUSUM (cusum2) scheme is based on

the one-sided one, i. e. 58.78 = 117.56/2 and 6.4036 = (6.4044·49716)/(6.4044+

49716) with 49716 = L’1 .

For the setup of the SPC scheme it is usual to give the design parameter » and

k for EWMA and CUSUM, respectively, and a value ξ for the in-control ARL.

Then, the critical value c (c2 or c3 ) is the solution of the equation Lµ0 (c) =

ξ. Here, the regula falsi is used with an accuracy of |Lµ0 (c) ’ ξ| < 0.001.

The quantlet XFGc.xpl demonstrates the computation of the critical values

for SPC schemes with in-control ARLs of ξ = 300, reference value k = 0.5

(CUSUM), smoothing parameter » = 0.1 (EWMA), zreflect= ’4, and the

Markov chain parameter r = 50.

chart ewma1 ewma2 cusum1 cusum2 cusumC

c 2.3081 2.6203 3.8929 4.5695 4.288

XFGc.xpl

248 11 Statistical Process Control

The parameter r = 50 guarantees fast computation and suitable accuracy.

Depending on the power of the computer one can try values of r up to 1000 or

larger (see XFGrarl.xpl in the beginning of the section).

11.2.2 Average Delay

The usage of the routines for computing the Average Delay (AD) is similar

to the ARL routines. Replace only the code arl by ad. Be aware that the

computing time is larger than in case of the ARL, because of the computation

of the dominant eigenvalue. It would be better to choose smaller r, especially in

the case of the two-sided CUSUM. Unfortunately, there is no relation between

the one- and two-sided schemes as for the ARL in (11.11). Therefore, the library

computes the AD for the two-sided CUSUM based on a twodimensional Markov

chain with dimension (r + 1)2 — (r + 1)2 . Thus with values of r larger than 30,

the computing time becomes quite large. Here the results follow for the above

quantlet XFGrarl.xpl with ad instead of arl and r = 30 for spccusum2ad:

chart ewma1 ewma2 cusum1 cusum2 cusumC

D0 1685.8 829.83 114.92 56.047 74.495

D1 11.204 11.168 5.8533 5.8346 6.2858

XFGad.xpl

11.2.3 Probability Mass and Cumulative Distribution

Function

The computation of the probability mass function (PMF) and of the cumulative

distribution function (CDF) is implemented in two di¬erent types of routines.

The ¬rst one with the syntax spcchartpmf returns the values of the PMF

P (L = n) and CDF P (L ¤ n) at given single points of n, where chart has

to be replaced by ewma1, ..., cusumC. The second one written as spcchartpmfm

computes the whole vectors of the PMF and of the CDF up to a given point

n, i. e. P (L = 1), P (L = 2), . . . , P (L = n) and the similar one of the CDF.

Note, that the same is valid as for the Average Delay (AD). In case of the

two-sided CUSUM scheme the computations are based on a twodimensional

11.2 Chart characteristics 249

Markov chain. A value of parameter r less than 30 would be computing time

friendly.

With the quantlet XFGpmf1.xpl the 5 di¬erent schemes (r = 50, for cusum2

r = 25) are compared according their in-control PMF and CDF (µ = µ0 = 0) at

the positions n in {1, 10, 20, 30, 50, 100, 200, 300}. Remark, that the in-control

ARL of all schemes is chosen as 300.

chart ewma1 ewma2 cusum1 cusum2 cusumC

6 · 10’8 2 · 10’9 6 · 10’6 4 · 10’7 2 · 10’6

P (L = 1)

P (L = 10) 0.00318 0.00272 0.00321 0.00307 0.00320

P (L = 20) 0.00332 0.00324 0.00321 0.00325 0.00322

P (L = 30) 0.00315 0.00316 0.00310 0.00314 0.00311

P (L = 50) 0.00292 0.00296 0.00290 0.00294 0.00290

P (L = 100) 0.00246 0.00249 0.00245 0.00248 0.00245

P (L = 200) 0.00175 0.00177 0.00175 0.00176 0.00175

P (L = 300) 0.00125 0.00126 0.00124 0.00125 0.00125

6 · 10’8 2 · 10’9 6 · 10’6 4 · 10’7 2 · 10’6

P (L = 1)

P (L ¤ 10) 0.01663 0.01233 0.02012 0.01675 0.01958

P (L ¤ 20) 0.05005 0.04372 0.05254 0.04916 0.05202

P (L ¤ 30) 0.08228 0.07576 0.08407 0.08109 0.08358

P (L ¤ 50) 0.14269 0.13683 0.14402 0.14179 0.14360

P (L ¤ 100) 0.27642 0.27242 0.27728 0.27658 0.27700

P (L ¤ 200) 0.48452 0.48306 0.48480 0.48597 0.48470

P (L ¤ 300) 0.63277 0.63272 0.63272 0.63476 0.63273

XFGpmf1.xpl

A more appropriate, graphical representation provides the quantlet

XFGpmf2.xpl. Figure 11.4 shows the corresponding graphs.

250 11 Statistical Process Control

run length cdf - in control

15

P(L<=n)*E-2

10

5

0

0 10 20 30 40 50

n

run length cdf - out of control

1

P(L<=n)

0.5

ewma2

cusumC

0

0 10 20 30 40 50

n

Figure 11.4. CDF for two-sided EWMA and Crosier™s CUSUM for

µ = 0 (in control) and µ = 1 (out of control)

XFGpmf2.xpl

11.3 Comparison with existing methods 251

11.3 Comparison with existing methods

11.3.1 Two-sided EWMA and Lucas/Saccucci

Here, we compare the ARL and AD computations of Lucas and Saccucci (1990)

with XploRe results. In their paper they use as in-control ARL ξ = 500. Then

for, e. g., » = 0.5 and » = 0.1 the critical values are 3.071 and 2.814, respec-

tively. By using XploRe the related values are 3.0712 and 2.8144, respectively.

It is known, that the smaller » the worse the accuracy of the Markov chain

approach. Therefore, r is set greater for » = 0.1 (r = 200) than for » = 0.5

(r = 50). Table 11.1 shows some results of Lucas and Saccucci (1990) on

ARLs and ADs. Their results are based on the Markov chain approach as

well. However, they used some smaller matrix dimension and ¬tted a regres-

sion model on r (see Subsection 11.3.2). The corresponding XploRe results

µ 0 0.25 0.5 0.75 1 1.5 2 3 4 5

» = 0.5

Lµ 500 255 88.8 35.9 17.5 6.53 3.63 1.93 1.34 1.07

Dµ 499 254 88.4 35.7 17.3 6.44 3.58 1.91 1.36 1.10

» = 0.1

Lµ 500 106 31.3 15.9 10.3 6.09 4.36 2.87 2.19 1.94

Dµ 492 104 30.6 15.5 10.1 5.99 4.31 2.85 2.20 1.83

Table 11.1. ARL and AD values from Table 3 of Lucas and Saccucci

(1990)

by using the quantlet XFGlucsac.xpl coincide with the values of Lucas and

Saccucci (1990).

XFGlucsac.xpl

11.3.2 Two-sided CUSUM and Crosier

Crosier (1986) derived a new two-sided CUSUM scheme and compared it with

the established combination of two one-sided schemes. Recall Table 3 of Crosier

(1986), where the ARLs of the new and the old scheme were presented. The

reference value k is equal to 0.5. First, we compare the critical values. By

252 11 Statistical Process Control

µ 0 0.25 0.5 0.75 1 1.5 2 2.5 3 4 5

old scheme, h = 4

Lµ 168 74.2 26.6 13.3 8.38 4.74 3.34 2.62 2.19 1.71 1.31

new scheme, h = 3.73

Lµ 168 70.7 25.1 12.5 7.92 4.49 3.17 2.49 2.09 1.60 1.22

old scheme, h = 5

Lµ 465 139 38.0 17.0 10.4 5.75 4.01 3.11 2.57 2.01 1.69

new scheme, h = 4.713

Lµ 465 132 35.9 16.2 9.87 5.47 3.82 2.97 2.46 1.94 1.59

Table 11.2. ARLs from Table 3 of Crosier (1986)

using XploRe ( XFGcrosc.xpl) with r = 100 one gets c = 4.0021 (4), 3.7304

(3.73), 4.9997 (5), 4.7133 (4.713), respectively “ the original values of Crosier

are written in parentheses. By comparing the results of Table 11.2 with the

results obtainable by the quantlet XFGcrosarl.xpl (r = 100) it turns out,

that again the ARL values coincide with one exception only, namely L1.5 = 4.75

for the old scheme with h = 4.

XFGcrosarl.xpl

Further, we want to compare the results for the Average Delay (AD), which is

called Steady-State ARL in Crosier (1986). In Table 5 of Crosier we ¬nd the

related results. A slight modi¬cation of the above quantlet XFGcrosarl.xpl

µ 0 0.25 0.5 0.75 1 1.5 2 2.5 3 4 5

old scheme, h = 4

Lµ 163 71.6 25.2 12.3 7.68 4.31 3.03 2.38 2.00 1.55 1.22

new scheme, h = 3.73

Lµ 164 69.0 24.3 12.1 7.69 4.39 3.12 2.46 2.07 1.60 1.29

old scheme, h = 5

Lµ 459 136 36.4 16.0 9.62 5.28 3.68 2.86 2.38 1.86 1.53

new scheme, h = 4.713

Lµ 460 130 35.1 15.8 9.62 5.36 3.77 2.95 2.45 1.91 1.57

Table 11.3. ADs (steady-state ARLs) from Table 5 of Crosier (1986)

11.4 Real data example “ monitoring CAPM 253

allows to compute the ADs. Remember, that the computation of the AD for

the two-sided CUSUM scheme is based on a twodimensional Markov chain.

Therefore the parameter r is set to 25 for the scheme called old scheme by

Crosier. The results are summarized in Table 11.4.

µ 0 0.25 0.5 0.75 1 1.5 2 2.5 3 4 5

old scheme, h = 4

Lµ 163 71.6 25.2 12.4 7.72 4.33 3.05 2.39 2.01 1.55 1.22

new scheme, h = 3.73

Lµ 165 69.1 24.4 12.2 7.70 4.40 3.12 2.47 2.07 1.60 1.29

old scheme, h = 5

Lµ 455 136 36.4 16.0 9.65 5.30 3.69 2.87 2.38 1.86 1.54

new scheme, h = 4.713

Lµ 460 130 35.1 15.8 9.63 5.37 3.77 2.95 2.45 1.91 1.57

Table 11.4. ADs (steady-state ARLs) computed by XploRe, di¬erent

values to Table 11.3 are printed as italics XFGcrosad.xpl

While the ARL values in the paper and computed by XploRe coincide, those

for the AD di¬er slightly. The most prominent deviation (459 vs. 455) one

observes for the old scheme with h = 5. One further in-control ARL di¬erence

one notices for the new scheme with h = 3.73. All other di¬erences are small.

There are di¬erent sources for the deviations:

T T

1. Crosier computed D(32) = (p32 L)/(p32 1) and not the actual limit D

(see 11.8, 11.10, and 11.12).

2. Crosier used ARL(r) = ARL∞ + B/r2 + C/r4 and ¬tted this model

for r = 8, 9, 10, 12, 15. Then, ARL∞ is used as ¬nal approximation. In

order to get the above D(32) one needs the whole vector L, such that this

approach might be more sensitive to approximation errors than in the

single ARL case.

11.4 Real data example “ monitoring CAPM

There are di¬erent ways of applying SPC to ¬nancial data. Here, we use a

twosided EWMA chart for monitoring the Deutsche Bank (DBK) share. More

254 11 Statistical Process Control

precisely, a capital asset pricing model (CAPM) is ¬tted for DBK and the DAX

which is used as proxy of the e¬cient market portfolio. That is, denoting with

rDAX,t and rDBK,t the log returns of the DAX and the DBK, respectively, one

assumes that the following regression model is valid:

rDBK,t = ± + β rDAX,t + µt (11.14)

Usually, the parameters of the model are estimated by the ordinary least

squares method. The parameter β is a very popular measure in applied ¬-

nance, Elton and Gruber (1991). In order to construct a real portfolio, the

β coe¬cient is frequently taken into account. Research has therefore concen-

trated on the appropriate estimation of constant and time changing β. In the

context of SPC it is therefore useful to construct monitoring rules which signal

changes in β. Contrary to standard SPC application in industry there is no

obvious state of the process which one can call ”in-control”, i. e. there is no

target process. Therefore, pre-run time series of both quotes (DBK, DAX)

are exploited for building the in-control state. The daily quotes and log re-

turns, respectively, from january, 6th, 1995 to march, 18th, 1997 (about 450

observations) are used for ¬tting (11.14):

ANOVA SS df MSS F-test P-value

_________________________________________________________________________

Regression 0.025 1 0.025 445.686 0.0000

Residuals 0.025 448 0.000

Total Variation 0.050 449 0.000

Multiple R = 0.70619

R^2 = 0.49871

Adjusted R^2 = 0.49759

Standard Error = 0.00746

PARAMETERS Beta SE StandB t-test P-value

________________________________________________________________________

b[ 0,]= -0.0003 0.0004 -0.0000 -0.789 0.4307

b[ 1,]= 0.8838 0.0419 0.7062 21.111 0.0000

With b[1,] = β = 0.8838 a typical value has been obtained. The R2 =

0.49871 is not very large. However, the simple linear regression is considered

in the sequel. The (empirical) residuals of the above model are correlated (see

Figure 11.5). The SPC application should therefore be performed with the

(standardized) residuals of an AR(1) ¬t to the regression residuals. For an

application of the XploRe quantlet armacls (quantlib times) the regression

residuals were standardized. By using the conditional least squares method an

11.4 Real data example “ monitoring CAPM 255

Sample partial autocorrelation function (pacf)

20

15

10

pacf*E-2

5

0

-5

-10

5 10 15 20

lag

Figure 11.5. Partial autocorrelation function of CAPM regression resid-

uals

estimate of ˆ = 0.20103 for the AR(1) model

µt = µt’1 + ·t (11.15)

has been obtained. Eventually, by plugging in the estimates of ±, β and ,

and standardizing with the sample standard deviation of the pre-run residuals

series (see (11.15)) one gets a series of uncorrelated data with expectation 0 and

variance 1, i. e. our in-control state. If the ¬tted model (CAPM with AR(1)

noise) remains valid after the pre-run, the related standardized residuals behave

like in the in-control state. Now, the application of SPC, more precisely of a

twosided EWMA chart, allows to monitor the series in order to get signals, if

the original model was changed. Changes in ± or β in (11.14) or in in (11.15)

or in the residual variance of both models lead to shifts or scale changes in the

empirical residuals series. Hence, the probability of an alarm signaled by the

EWMA chart increases (with one exception only “ decreased variances). In

this way a possible user of SPC in ¬nance is able to monitor an estimated and

presumed CAPM.

256 11 Statistical Process Control

In our example we use the parameter » = 0.2 and an in-control ARL of 500,

such that the critical value is equal to c = 2.9623 (the Markov chain parameter

r was set to 100). Remark, that the computation of c is based on the normality

assumption, which is seldom ful¬lled for ¬nancial data. In our example the

hypothesis of normality is rejected as well with a very small p value (Jarque-

Bera test with quantlet jarber). The estimates of skewness 0.136805 and

kurtosis 6.64844 contradict normality too. The fat tails of the distribution are

a typical pattern of ¬nancial data. Usually, the fat tails lead to a higher false

alarm rate. However, it would be much more complicated to ¬t an appropriate

distribution to the residuals and use these results for the ”correct” critical

value.

The Figures 11.6 and 11.7 present the EWMA graphs of the pre-run and the

monitoring period (from march, 19th, 1997 to april, 16th, 1999). In the pre-run

Twosided EWMA chart

76 lambda = 0.20, in-control ARL = 499

1

0.5

Z_t

0

-0.5

-1

0 1 2 3 4

t*E2

Figure 11.6. Twosided EWMA chart of the standardized CAPM-AR(1)

residuals for the pre-run period (06/01/95 - 03/18/97)

period the EWMA chart signals 4 times. The ¬rst 3 alarms seem to be outliers,

while the last points on a longer change. Nevertheless, the chart performs quite

typical for the pre-run period. The ¬rst signal in the monitoring period was

obtained at the 64th observation (i. e. 06/24/97). Then, we observe more

frequently signals than in the pre-run period, the changes are more persistent

and so one has to assume, that the pre-run model is no longer valid. A new

CAPM has therefore to be ¬tted and, if necessary, the considered portfolio has

to be reweighted. Naturally, a new pre-run can be used for the new monitoring

11.4 Real data example “ monitoring CAPM 257

Twosided EWMA chart

64 lambda = 0.20, in-control ARL = 499

2

1

Z_t

0

-1

-2

0 1 2 3 4 5

t*E2

Figure 11.7. Twosided EWMA chart of the standardized CAPM-AR(1)

residuals for the monitoring period (03/19/97 - 04/16/99)

period.

XFGcapmar1.xpl

Bibliography

Brook, D. and Evans, D. A. (1972). An approach to the probability distribution

of cusum run length, Biometrika 59: 539“548.

Crosier, R. B. (1986). A new two-sided cumulative quality control scheme,

Technometrics 28: 187“194.

Crowder, S. V. (1986). A simple method for studying run-length distributions

of exponentially weighted moving average charts, Technometrics 29: 401“

407.

Elton, EJ. and Gruber, MJ. (1991). Modern portfolio theory and investment

analysis, Wiley, 4. edition.

Knoth, S. (1998). Quasi-stationarity of CUSUM schemes for Erlang distribu-

tions, Metrika 48: 31“48.

258 11 Statistical Process Control

Lorden, G. (1971). Procedures for reacting to a change in distribution, Annals

of Mathematical Statistics 42: 1897“1908.

Lucas, J. M. (1976). The design and use of V-mask control schemes, Journal

of Quality Technology 8: 1“12.

Lucas, J. M. and Crosier, R. B. (1982). Fast initial response for cusum quality-

control schemes: Give your cusum a headstart, Technometrics 24: 199“

205.

Lucas, J. M. and Saccucci, M. S. (1990). Exponentially weighted moving aver-

age control schemes: properties and enhancements, Technometrics 32: 1“

12.

Page, E. S. (1954). Continuous inspection schemes, Biometrika 41: 100“115.

Pollak, M. and Siegmund, D. (1975). Approximations to the expected sample

size of certain sequential tests, Annals of Statistics 3: 1267“1282.

Pollak, M. and Siegmund, D. (1985). A di¬usion process and its applications to

detection a change in the drift of brownian motion, Biometrika 72: 267“

280.

Roberts, S. W. (1959). Control-charts-tests based on geometric moving aver-

ages, Technometrics 1: 239“250.

Shewhart, W. A. (1931). Economic Control of Quality of Manufactured Product,

D. van Nostrand Company, Inc., Toronto, Canada.

Vance, L. (1986). ARL of cumulative sum control chart for controlling normal

mean, Journal of Quality Technology 18: 189“193.

Waldmann, K.-H. (1986). Bounds to the distribution of the run length in

general quality“control schemes, Statistische Hefte 27: 37“56.

12 An Empirical Likelihood

Goodness-of-Fit Test for

Di¬usions

Song Xi Chen, Wolfgang H¨rdle and Torsten Kleinow

a

The analysis and prediction of di¬usion processes plays a fundamental role

in the statistical analysis of ¬nancial markets. The techniques applied rely

on the actual model assumed for the drift and di¬usion coe¬cient functions.

Mismodelling these coe¬cients might result in biased prediction and incorrect

parameter speci¬cation. We show in this chapter how the empirical likelihood

technique, Owen (1988) and Owen (1990), may be used to construct test pro-

cedures for the Goodness-of-Fit of a di¬usion model. The technique is based

on comparison with kernel smoothing estimators. The Goodness-of-Fit test

proposed is based on the asymptotics of the empirical likelihood, which has

two attractive features. One is its automatic consideration of the variation as-

sociated with the nonparametric ¬t due to the empirical likelihood™s ability to

studentize internally. The other one is that the asymptotic distributions of the

test statistic are free of unknown parameters which avoids secondary plug-in

estimation.

12.1 Introduction

Let us assume a strictly stationary one-dimensional di¬usion Z solving the

stochastic di¬erential equation (SDE)

dZ(t) = m{Z(t)}dt + v{Z(t)}dW (t) (12.1)

where the driving process W = {W (t), t ∈ [0, ∞)} in (12.1) is a standard

Wiener process. In a mathematical ¬nance setting, Z might be the price process

260 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

of a stock, a stock market index or any other observable process. For the rest

of the chapter the drift m : R ’ R, and the di¬usion coe¬cient v : R ’ [0, ∞)

in (12.1) are assumed to be su¬ciently smooth, so that a unique solution of

(12.1) exists.

In applications we are mostly interested in the stationary solutions of (12.1).

For the existence of a stationary solution, the drift and the di¬usion coe¬cient

must satisfy some conditions, Bibby and Sørensen (1995). The most important

condition is that the stationary forward Kolmogorov equation

(1/2) v 2 (z)p(z) ’ m(z)p(z) = 0

has a solution p(z) which is a probability density. If the initial value Z(0) is

distributed in accordance with p0 , and if it is independent of the Wiener process

W (t) in (12.1), then (12.1) de¬nes a stationary process. The above condition

holds for the Ornstein-Uhlenbeck process with a normal stationary distribution,

and for the Cox-Ingersoll-Ross process with a “-distribution. For the statistical

analysis we assume that Z is observed at discrete times ti = i∆, i = 1, 2, . . . , n,

with a time step size ∆ > 0. From these observations we get a time series Z ∆

with certain dynamics speci¬ed in Section 12.2.

The aim of this chapter is to test a parametric model for the drift function m

against a nonparametric alternative, i.e.

H0 (m) : m(z) = mθ (z) (12.2)

where θ is an unknown parameter. The test statistic we apply is based on

the empirical likelihood. This concept was introduced by Chen, H¨rdle and

a

Kleinow (2001) for time series. To apply it in our situation we start with the

discretization of the di¬usion process Z.

12.2 Discrete Time Approximation of a Di¬usion

Let us assume that the di¬usion process Z is observed at discrete times ti =

i∆, i = 1, 2, . . . , n, with a time step size ∆ > 0. Here we suppose that ∆ is

small or, more precisely, will tend to zero asymptotically. Under rather weak

assumptions, see Kloeden and Platen (1999), on the functions m and v 2 , it can

be shown that the Euler approximation

t t

∆ ∆ ∆

v Z ∆ (tis ) dW (s)

Z (t) = Z (0) + m Z (tis ) ds + (12.3)

0 0

12.3 Hypothesis Testing 261

with tis = max{ti , ti ¤ s}, converges in a mean square sense to Z as ∆ ’ 0,

i.e.,

lim E( sup |Z ∆ (t) ’ Z(t)|2 ) = 0, T > 0. (12.4)

∆’0 0¤t¤T

From now on, we assume that a discrete time approximation Z ∆ exists in the

form of (12.3), and that the property (12.4) holds. For the purposes of this

chapter, ∆ will always be considered small enough that one can substitute Z

by Z ∆ in our interpretation of the observed data. The increments of the Euler

approximation and so the observed data will have the form

Z ∆ (ti+1 ) ’ Z ∆ (ti ) = m Z ∆ (ti ) ∆ + v Z ∆ (ti ) W (ti+1 ) ’ W (ti ) (12.5)

for i = 0, 1, . . . .. The observations {Z ∆ (ti )}, i = 0, 1, . . . n form a time series.

As long as the step size ∆ is small enough the concrete choice of ∆ does not

matter since all the relevant information about the model is contained in the

drift m and di¬usion coe¬cient v.

For the following we introduce the notation

def def

Z ∆ (ti ) ,

Xi = X = (X1 , . . . , Xn )

def def

W (ti+1 ) ’ W (ti ) ,

µi = µ = (µ1 , . . . , µn )

def def

Xi+1 ’ Xi = m Xi ∆ + v Xi µi ,

Yi = Y = (Y1 , . . . , Yn )(12.6)

We can now apply the empirical likelihood Goodness-of-Fit test for stationary

time series developed by Chen et al. (2001).

12.3 Hypothesis Testing

Suppose (X, Y ) is de¬ned as in (12.6) and let m(x) = E(Y |X = x) be the

conditional mean function, f be the density of the design points X, and σ 2 (x) =

Var(Y |X = x) be the conditional variance function of Y given X = x ∈ S, a

closed interval S ‚ R. Suppose that {mθ |θ ∈ ˜} is a parametric model for the

ˆ

mean function m and that θ is an estimator of θ under this parametric model.

The interest is to test the null hypothesis:

for all x ∈ S

H0 : m(x) = mθ (x)

against a series of local smooth nonparametric alternatives:

H1 : m(x) = mθ (x) + cn ∆n (x),

262 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

where cn is a non-random sequence tending to zero as n ’ ∞ and ∆n (x) is a

sequence of bounded functions.

The problem of testing against a nonparametric alternative is not new for an

independent and identically distributed setting, H¨rdle and Mammen (1993)

a

and Hart (1997). In a time series context the testing procedure has only been

considered by Kreiß, Neumann and Yao (1998) as far as we are aware. Also

theoretical results on kernel estimators for time series appeared only very re-

cently, Bosq (1998). This is surprising given the interests in time series for

¬nancial engineering.

We require a few assumptions to establish the results in this chapter. These

assumptions are the following:

(i) The kernel K is Lipschitz continuous in [’1, 1], that is |K(t1 ) ’ K(t2 )| ¤

C||t1 ’ t2 || where || · || is the Euclidean norm, and h = O{n’1/5 };

(ii) f , m and σ 2 have continuous derivatives up to the second order in S.

ˆ

(iii) θ is a parametric estimator of θ within the family of the parametric

model, and

sup |mθ (x) ’ mθ (x)| = Op (n’1/2 ).

ˆ

x∈S

(iv) ∆n (x), the local shift in the alternative H1 , is uniformly bounded with

respect to x and n, and cn = n’1/2 h’1/4 which is the order of the di¬er-

ence between H0 and H1 .

(v) The process {(Xi , Yi )} is strictly stationary and ±-mixing, i.e.

def

|P(AB) ’ P(A)P(B)| ¤ aρk

±(k) = sup

∞

i

A∈F1 ,B∈Fi+k

l

for some a > 0 and ρ ∈ [0, 1). Here Fk denotes the σ-algebra of events

generated by {(Xi , Yi ), k ¤ i ¤ l} for l ≥ k. For an introduction into

±-mixing processes, see Bosq (1998) or Billingsley (1999). As shown by

Genon-Catalot, Jeantheau and Lar´do (2000) this assumption is ful¬lled

e

if Zt is an ±-mixing process.

(vi) E{exp(a0 |Y1 ’ m(X1 )|)} < ∞ for some a0 > 0; The conditional density

of X given Y and the joint conditional density of (X1 , Xl ) given (Y1 , Yl )

are bounded for all l > 1.

12.4 Kernel Estimator 263

Assumptions (i) and (ii) are standard in nonparametric curve estimation and

are satis¬ed for example for bandwidths selected by cross validation, whereas

(iii) and (iv) are common in nonparametric Goodness-of-Fit tests. Assumption

(v) means the data are weakly dependent. It is satis¬ed for a wide class of

di¬usion processes.

12.4 Kernel Estimator

To develop a test about H0 we ¬rst introduce a nonparametric kernel estimator

for m. For an introduction into kernel estimation see H¨rdle (1990), Wand and

a

Jones (1995) and (H¨rdle, M¨ller, Sperlich and Werwatz, 2000). Without loss

a u

of generality we assume that we are only interested in m(x) for x ∈ [0, 1] and

that f (x) ≥ C1 ∀x ∈ [0, 1] with a positive constant C1 . If in a particular

problem the data are supported by another closed interval, this problem can

be transformed by rescaling into an equivalent problem with data support [0, 1].

Let K be a bounded probability density function with a compact support on

[’1, 1] that satis¬es the moment conditions:

u2 K(u)du = σK

2

uK(u)du = 0,

2

where σK is a positive constant. Let h be a positive smoothing bandwidth

which will be used to smooth (X, Y ).

The nonparametric estimator considered is the Nadaraya-Watson (NW) esti-

mator n

Yi Kh (x ’ Xi )

m(x) = i=1

ˆ (12.7)

n

i=1 Kh (x ’ Xi )

with Kh (u) = h’1 K(h’1 u). This estimator is calculated in XploRe by the

quantlets regest or regxest.

The parameter estimation of θ depends on the null hypothesis. We assume

√

here, that the parameter θ is estimated by a n-consistent estimator. Let

Kh (x ’ Xi )mθ (Xi )

ˆ

mθ (x) =

˜ˆ n

i=1 Kh (x ’ Xi )

be the smoothed parametric model. The test statistic we are going to consider

is based on the di¬erence between mθ and m, rather than directly between m

˜ˆ ˆ ˆ

264 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

and mθ , in order to avoid the issue of bias associated with the nonparametric

ˆ

¬t.

The local linear estimator can be used to replace the NW estimator in estimat-

ing m. However, as we compare m with mθ in formulating the Goodness-of-Fit

ˆ ˜ˆ

test, the possible bias associated with the NW estimator is not an issue here.

In addition, the NW estimator has a simpler analytic form.

12.5 The Empirical Likelihood concept

12.5.1 Introduction into Empirical Likelihood

Let us now as in Owen (1988) and Owen (1990) introduce the empirical likeli-

hood (EL) concept. Suppose a sample (U1 , . . . , Un ) of independent identically

distributed random variables in R1 according to a probability law with un-

known distribution function F and unknown density f . For an observation

(u1 , . . . , un ) of (U1 , . . . , Un ) the likelihood function is given by

n

¯

L(f ) = f (ui ) (12.8)

i=1

The empirical density calculated from the observations (u1 , . . . , un ) is

n

1

def

fn (u) = 1{ui = u} (12.9)

n i=1

where 1 denotes the indicator function. It is easy to see that fn maximizes

¯

L(f ) in the class of all probability density functions.

The objective of the empirical likelihood concept is the construction of tests and

con¬dence intervals for a parameter θ = θ(F ) of the distribution of Ui . To keep

things simple we illustrate the empirical likelihood method for the expectation

E[Ui ]. The null hypothesis is E[Ui ] = θ. We can test this assumption based on

the empirical likelihood ratio

¯

L{f (θ)}

def

R(F ) = (12.10)

¯

L(fn )

¯

where f (θ) maximizes L(f ) subject to

Ui dF = θ. (12.11)

12.5 The Empirical Likelihood concept 265

On a heuristic level we can reject the null hypothesis “under the true distribu-

tion F , U has expectation θ” if the ratio R(F ) is small relative to 1, i.e. the

test rejects if R(F ) < r for a certain level r ∈ (0, 1). More precisely, Owen

(1990) proves the following

THEOREM 12.1 Let (U1 , . . . , Un ) be iid one-dimensional random variables

with expectation θ and variance σ 2 . For a positive r < 1 let

Fn , R(F ) ≥ r

Cr,n = Ui dF F

be the set of all possible expectations of U with respect to distributions F dom-

inated by Fn (F Fn ). Then it follows

lim P[θ ∈ Cr,n ] = P[χ2 ¤ ’2 log r] (12.12)

n’∞

where χ2 is a χ2 -distributed random variable with one degree of freedom.

From Theorem 12.1 it follows directly

¤ r EUi = θ = P[χ2 ¤ r]

lim P ’ 2 log max R(F )

n’∞ {F |F Fn , Ui dF =θ}

This result suggests therefore to use the log-EL ratio

¯

L{f (θ)}

’2 log = ’2 log

max R(F ) max ¯

Ui dF =θ} L(fn )

{F |F {F |F

Fn , Ui dF =θ} Fn ,

as the basic element of a test about a parametric hypothesis for the drift func-

tion of a di¬usion process.

12.5.2 Empirical Likelihood for Time Series Data

We will now expand the results in Section 12.5.1 to the case of time series data.

For an arbitrary x ∈ [0, 1] and any function µ we have

x ’ Xi

{Yi ’ µ(x)} E[Yi |Xi = x] = µ(x) = 0.

EK (12.13)

h

Let pi (x) be nonnegative numbers representing a density for

x ’ Xi

{Yi ’ µ(x)}

K i = 1, . . . , n

h

266 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

The empirical likelihood for µ(x) is

n

def

L{µ(x)} = max pi (x) (12.14)

i=1

n n x’Xi

{Yi ’ µ(x)} = 0. The

subject to i=1 pi (x) = 1 and pi (x)K

i=1 h

second condition re¬‚ects (12.13).

We ¬nd the maximum by introducing Lagrange multipliers and maximizing the

Lagrangian function

n

L(p, »1 , »2 ) = log pi (x)

i=1

n n

x ’ Xi

’»1 {Yi ’ µ(x)} ’ »2 pi (x) ’ 1

pi (x)K

h

i=1 i=1

The partial derivatives are

x ’ Xi

‚L(p, »1 , »2 ) 1

’ »1 K {Yi ’ µ(x)} ’ »2 ∀i = 1, . . . , n .

=

‚pi (x) pi (x) h

With » = »1 /»2 we obtain as a solution to (12.14) the optimal weights

’1

x ’ Xi

’1

{Yi ’ µ(x)}

pi (x) = n 1 + »(x)K (12.15)

h

where »(x) is the root of

n x’Xi

{Yi ’ µ(x)}

K h

= 0. (12.16)

»(x)K x’Xi {Yi ’ µ(x)}

1+ h

i=1

Note, that »2 = n follows from

n n

x ’ Xi

{Yi ’ µ(x)} = 1 .

pi (x) + » pi (x)K

h

i=1 i=1

The maximum empirical likelihood is achieved at pi (x) = n’1 corresponding

ˆ

to the nonparametric curve estimate µ(x) = m(x). For a parameter estimate θ

ˆ

we get the maximum empirical likelihood for the smoothed parametric model

L{mθ (x)}. The log-EL ratio is

˜ˆ

L{mθ (x)}

˜ˆ

def

= ’2 log[L{mθ (x)}nn ].

{mθ (x)} = ’2 log

˜ˆ ˜ˆ

L{m(x)}

ˆ

12.5 The Empirical Likelihood concept 267

To study properties of the empirical likelihood based test statistic we need to

evaluate {mθ (x)} at an arbitrary x ¬rst, which requires the following lemma

˜ˆ

on »(x) that is proved in Chen et al. (2001).

LEMMA 12.1 Under the assumptions (i)-(vi),

sup |»(x)| = Op {(nh)’1/2 log(n)}.

x∈[0,1]

Let γ(x) be a random process with x ∈ [0, 1]. Throughout this chapter we use

˜

the notation γ(x) = Op (δn ) ( Op (δn )) to denote the facts that supx∈[0,1] |γ(x)| =

˜

Op (δn ) (Op (δn )) for a sequence δn .

j

n

¯ x’Xi

Let Uj (x) = (nh)’1 {Yi ’ mθ (x)}

K ˜ˆ for j = 1, 2, . . .. An appli-

i=1 h

cation of the power series expansion of 1/(1 ’ •) applied to (12.16) and Lemma

12.1 yields

∞

n

x ’ Xi x ’ Xi

(’»(x))j K j {Yi ’ mθ (x)}j = 0.

{Yi ’ mθ (x)}

K ˜ˆ ˜ˆ

h h

i=1 j=0

Inverting the above expansion, we have

¯ ’1 ¯

»(x) = U2 (x)U1 (x) + Op {(nh)’1 log2 (n)}.

˜ (12.17)

From (12.15), Lemma 12.1 and the Taylor expansion of log(1 + •) we get

{mθ (x)} = ’2 log[L{mθ (x)}nn ]

˜ˆ ˜ˆ

n

x ’ Xi

{Yj ’ mθ (x)}]

= 2 log[1 + »(x)K ˜ˆ

h

i=1

¯ ¯

2nh»(x)U1 ’ nh»2 (x)U2 + Op {(nh)’1/2 log3 (n)}

˜

=

(12.18)

Inserting (12.17) in (12.18) yields

¯ ’1 ¯2

{mθ (x)} = nhU2 (x)U1 (x) + Op {(nh)’1/2 log3 (n)}.

˜

˜ˆ (12.19)

For any x ∈ [0, 1], let

1 1

2

’ y)dy and b(x; h) = h Kh (x ’ y)dy

v(x; h) = h Kh (x

0 0

268 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions

be the variance and the bias coe¬cient functions associated with the NW esti-

mator, respectively, see Wand and Jones (1995). Let

SI,h = {x ∈ [0, 1]| min (|x ’ 1|, |x|) > h}.

For h ’ 0, SI,h converges to the set of interior points in [0, 1]. If x ∈ SI,h , we

def

K 2 (x)dx and b(x; h) = 1. De¬ne

have v(x; h) =

v(x; h)σ 2 (x)

V (x; h) = .

f (x)b2 (x; h)

Clearly, V (x; h)/(nh) is the asymptotic variance of m(x) when nh ’ ∞ which

ˆ

is one of the conditions we assumed.

It was shown by Chen et al. (2001), that

n

¯ ’1

Kh (x ’ Xi ){Yi ’ mθ (x)}

U1 (x) =n ˜ˆ

i=1

n

˜

= n’1 Kh (x ’ Xi ){Yi ’ mθ (Xi )} + Op (n’1/2 )

i=1

ˆ ˜

= f (x){m(x) ’ mθ (x)} + Op (n’1/2 )

ˆ ˜

˜

= f (x)b(x; h){m(x) ’ mθ (x)} + Op {n’1/2 + (nh)’1 log2 (n)}.

ˆ ˜

¯

In the same paper it is shown, that condition (iii) entails supx∈[0,1] |U2 (x) ’

f (x)v(x; h)σ 2 (x)| = Op (h). These and (12.19) mean that

¯ ’1 ¯ 2 ˜

(nh)U2 U1 + Op {(nh)’1/2 log3 (n)}

{mθ (x)} =

˜ˆ

˜

= V ’1 (x; h){m(x) ’ mθ (x)}2 + O{(nh)’1 h log2 (n)}

ˆ ˜ (12.20)

Therefore, {mθ (x)} is asymptotically equivalent to a studentized L2 distance

˜ˆ

between mθ (x) and m(x). It is this property that leads us to use {mθ (x)} as

˜ˆ ˆ ˜ˆ

the basic building block in the construction of a global test statistic for distinc-

tion between mθ and m in the next section. The use of the empirical likelihood

˜ˆ ˆ

as a distance measure and its comparison with other distance measures have

been discussed in Owen (1991) and Baggerly (1998).

12.6 Goodness-of-Fit Statistic

To extend the empirical likelihood ratio statistic to a global measure of

Goodness-of-Fit, we choose kn -equally spaced lattice points t1 , t2 , · · · , tkn in

12.6 Goodness-of-Fit Statistic 269

[0, 1] where t1 = 0, tkn = 1 and ti ¤ tj for 1 ¤ i < j ¤ kn . We let kn ’ ∞

and kn /n ’ 0 as n ’ ∞. This essentially divides [0, 1] into kn small bins of

size (kn )’1 . A simple choice is to let kn = [1/(2h)] where [a] is the largest

integer less than a. This choice as justi¬ed later ensures asymptotic indepen-

dence among {mθ (tj )} at di¬erent tj s. Bins of di¬erent size can be adopted to

˜ˆ

suit situations where there are areas of low design density. This corresponds to

the use of di¬erent bandwidth values in adaptive kernel smoothing. The main

results of this chapter is not a¬ected by un-equal bins. For the purpose of easy

presentation, we will treat bins of equal size.

As {mθ (tj )} measures the Goodness-of-Fit at a ¬xed tj , an empirical likelihood

˜ˆ

based statistic that measures the global Goodness-of-Fit is de¬ned as

kn

def

{mθ (tj )}.

n (mθ )

˜ˆ = ˜ˆ

j=1

The following theorem was proven by Chen et al. (2001).

THEOREM 12.2 Under the assumptions (i) - (vi),

{m(x) ’ mθ (x)}2

ˆ ˜

’1 ’1

dx + Op {kn log2 (n) + h log2 (n)}

kn n (mθ )

˜ˆ = (nh)

V (x)

(12.21)

def

where V (x) = limh’0 V (x, h).

H¨rdle and Mammen (1993) proposed the L2 distance

a

Tn = nh1/2 {m(x) ’ mθ (x)}2 π(x)dx

ˆ ˜ˆ

as a measure of Goodness-of-Fit where π(x) is a given weight function.

’1