<<

. 9
( 14)



>>

are more popular than the one-sided ones. For two-sided CUSUM schemes we
consider a combination of two one-sided schemes, Lucas (1976) or Lucas and
Crosier (1982), and a scheme based on Crosier (1986). Note, that in some
recent papers the same concept of combination of two one-sided schemes is
used for EWMA charts.
Recall, that Shewhart charts are a special case of EWMA schemes (» = 1).
Therefore, we distinguish 5 SPC schemes “ ewma1, ewma2, cusum1, cusum2
(two one-sided schemes), and cusumC (Crosier™s scheme). For the two-sided
EWMA charts the following quantlets are provided in the XploRe quantlib
spc.

SPC quantlets for two-sided EWMA scheme
“ produces chart ¬gure
spcewma2
spcewma2arl “ returns ARL
“ returns critical value c2
spcewma2c
spcewma2ad “ returns AD (steady-state ARL)
spcewma2pmf “ returns probability mass and distribution function
of the run-length for single time points
spcewma2pmfm “ the same up to a given time point
240 11 Statistical Process Control


By replacing ewma2 by one of the remaining four scheme titles the related
characteristics can be computed.
The quantlets spcewma1,...,spccusumC generate the chart ¬gure. Here, we ap-
ply the 5 charts to arti¬cial data. 100 pseudo random values from a normal
distribution are generated. The ¬rst 80 values have expectation 0, the next 20
values have expectation 1, i. e. model (11.1) with µ0 = 0, µ1 = 1, and m = 81.
We start with the two-sided EWMA scheme and set » = 0.1, i. e. the chart is


Two-sided EWMA chart
lambda = 0.10, in-control ARL = 300 94
1
0.5
Z_t

0
-0.5




0 50 100
t

Figure 11.1. Two-sided EWMA chart XFGewma2fig.xpl



very sensitive to small changes. The critical value c2 (see (11.3)) is computed
to provide an in-control ARL of 300 (see Section 11.2). Thus, the scheme leads
in average after 300 observations to a false alarm.
EWMA
In Figure 11.1 the graph of Zt is plotted against time t = 1, 2, . . . , 100.
Further, the design parameter », the in-control ARL, and the time of alarm (if
there is one) are printed. One can see, that the above EWMA scheme detects
the change point m = 81 at time point 94, i. e. the delay is equal to 14. The
related average values, i. e. ARL and Average Delay (AD), for µ1 = 1 are 9.33
and 9.13, respectively. Thus, the scheme needs here about 5 observations more
than average.
11.1 Control Charts 241


In the same way the remaining four SPC schemes can be plotted. Remark,
that in case of ewma1 one further parameter has to be set. In order to obtain
EWMA
a suitable ¬gure and an appropriate scheme the EWMA statistic Zt (see
(11.4)) is re¬‚ected at a pre-speci¬ed border zreflect ¤ 0 (= µ0 ), i. e.
EWMA EWMA
} , t = 1, 2, . . .
Zt = max{zreflect, Zt

for an upper EWMA scheme. Otherwise, the statistic is unbounded, which
leads to schemes with poor worst case performance. Further, the methods


One-sided EWMA chart
1




lambda = 0.10, in-control ARL = 300, zreflect = -4.0 93
0.5
Z_t

0
-0.5




0 50 100
t

Figure 11.2. One-sided EWMA chart XFGewma1fig.xpl



used in Section 11.2 for computing the chart characteristics use bounded con-
tinuation regions of the chart. If zreflect is small enough, then the ARL and
the AD (which are not worst case criterions) of the re¬‚ected scheme are the
same as of the unbounded scheme. Applying the quantlet XFGewma1fig.xpl
with zreflect= ’4 leads to Figure 11.2. Thereby, zreflect has the same
normalization factor »/(2 ’ ») like the critical value c2 (see 2.). The corre-
sponding normalized border is printed as dotted line (see Figure 11.2). The
chart signals one observation earlier than the two-sided version in Figure 11.1.
The related ARL and AD values for µ1 = 1 are now 7.88 and 7.87, respectively.
In Figure 11.3 the three di¬erent CUSUM charts with k = 0.5 are presented.
They signal at the time points 87, 88, and 88 for cusum1, cusum2, and cusumC,
242 11 Statistical Process Control


respectively. For the considered dataset the CUSUM charts are faster be-

One-sided CUSUM chart
k = 0.50, in-control ARL = 300 87




10
Z_t

5
0




0 50 100
t
XFGcusum1fig.xpl

Two-sided CUSUM chart
k = 0.50, in-control ARL = 300 88
10
5
Z_t

0
-5




0 50 100
t
XFGcusum2fig.xpl

Crosier™s two-sided CUSUM chart
k = 0.50, in-control ARL = 300 88
10
5
Z_t

0




0 50 100
t
XFGcusumCfig.xpl

Figure 11.3. CUSUM charts: one-sided, two-sided, Crosier™s two-sided
11.2 Chart characteristics 243


cause of their better worst case performance. The observations right before
the change point at m = 81 are smaller than average. Therefore, the EWMA
charts need more time to react to the increased average. The related average
values of the run length, i. e. ARL and AD, are 8.17 and 7.52, 9.52 and 8.82,
9.03 and 8.79 for cusum1, cusum2, and cusumC, respectively.


11.2 Chart characteristics
Consider the change point model (11.1). For ¬xed m denote Pm (·) and Em (·)
the corresponding probability measure and expectation, respectively. Hereby,
m = ∞ stands for the case of no change, i. e. the so called in-control case. Then
the Average Run Length (ARL) (expectation of the run length L) is de¬ned
as
E∞ (L) , µ = µ0
Lµ = . (11.7)
E1 (L) , µ = µ0
Thus, the ARL denotes the average number of observations until signal for a
sequence with constant expectation. µ = µ0 or m = ∞ stands for no change,
µ = µ0 and m = 1 mark, that just at the ¬rst time point (or earlier) a change
takes place from µ0 to µ. Therefore, the ARL evaluates only the special sce-
nario of m = 1 of the SPC scheme. Other measures, which take into account
that usually 1 < m < ∞, were introduced by Lorden (1971) and Pollak and
Siegmund (1975), Pollak and Siegmund (1975). Here, we use a performance
measure which was ¬rstly proposed by Roberts (1959). The so called (condi-
tional) Average Delay (AD, also known as steady-state ARL) is de¬ned as
(m)
Dµ lim Dµ ,
= (11.8)
m’∞
(m)
Dµ = Em L ’ m + 1|L ≥ m ,
where µ is the value of µ1 in (11.1), i. e. the expectation after the change.
While Lµ measures the delay for the case m = 1, Dµ determines the delay for
a SPC scheme which ran a long time without signal. Usually, the convergence
(m)
in (11.8) is very fast. For quite small m the di¬erence between Dµ and Dµ is
very small already. Lµ and Dµ are average values for the random variable L.
Unfortunately, L is characterized by a large standard deviation. Therefore, one
might be interested in the whole distribution of L. Again, we restrict on the
special cases m = 1 and m = ∞. We consider the probability mass function
Pµ (L = n) (PMF) and the cumulative distribution function Pµ (L ¤ n) (CDF).
Based on the CDF, one is able to compute quantiles of the run length L.
244 11 Statistical Process Control


For normally distributed random variables it is not possible to derive exact
solutions for the above characteristics. There are a couple of approximation
techniques. Besides very rough approximations based on the Wald approxi-
mation known from sequential analysis, Wiener process approximations and
similar methods, three main methods can be distinguished:

1. Markov chain approach due to Brook and Evans (1972): Replacement of
the continuous statistic Zt by a discrete one
2. Quadrature of integral equations which are derived for the ARL, Vance
(1986) and Crowder (1986) and for some eigenfunctions which lead to the
AD
3. Waldmann (1986) approach: Iterative computation of P (L = n) by us-
ing quadrature and exploiting of monotone bounds for the considered
characteristics

Here we use the ¬rst approach, which has the advantage, that all considered
characteristics can be presented in a straightforward way. Next, the Markov
chain approach is brie¬‚y described. Roughly speaking, the continuous statistic
Zt is approximated by a discrete Markov chain Mt . The transition Zt’1 =
x ’ Zt = y is approximated by the transition Mt’1 = i w ’ Mt = j w with
x ∈ [i w ’ w/2, i w + w/2] and y ∈ [j w ’ w/2, j w + w/2]. That is, given an
integer r the continuation region of the scheme [’c, c], [zreflect, c], or [0, c]
is separated into 2 r + 1 or r + 1 intervals of the kind [i w ’ w/2, i w + w/2]
(one exception is [0, w/2] as the ¬rst subinterval of [0, c]). Then, the transition
kernel f of Zt is approximated by the discrete kernel of Mt , i. e.
f (x, y) ≈ P (i w ’ [j w ’ w/2, j w + w/2])/w
for all x ∈ [i w ’ w/2, i w + w/2] and y ∈ [j w ’ w/2, j w + w/2]. Eventually,
we obtain a Markov chain {Mt } with 2 r + 1 or r + 1 transient states and one
absorbing state. The last one corresponds to the alarm (signal) of the scheme.
Denote by Q = (qij ) the matrix of transition probabilities of the Markov chain
{Mt } on the transient states, 1 a vector of ones, and L = (Li ) the ARL vector.
Li stands for the ARL of a SPC scheme which starts in point i w (corresponds
to z0 ). In the case of a one-sided CUSUM scheme with z0 = 0 [0, w/2]
the value L0 approximates the original ARL. By using L we generalize the
original schemes to schemes with possibly di¬erent starting values z0 . Now,
the following linear equation system is valid, Brook and Evans (1972):
(I ’ Q) L = 1 , (11.9)
11.2 Chart characteristics 245


where I denotes the identity matrix. By solving this equation system we get
the ARL vector L and an approximation of the ARL of the considered SPC
scheme. Remark that the larger r the better is the approximation. In the days
of Brook and Evans (1972) the maximal matrix dimension r+1 (they considered
cusum1) was 15 because of the restrictions of the available computing facilities.
Nowadays, one can use dimensions larger than some hundreds. By looking
at di¬erent r one can ¬nd a suitable value. The quantlet XFGrarl.xpl
demonstrates this e¬ect for the Brook and Evans (1972) example. 9 di¬erent
values of r from 5 to 500 are used to approximate the in-control ARL of a
one-sided CUSUM chart with k = 0.5 and c3 = 3 (variance σ 2 = 1). We get

r 5 10 20 30 40 50 100 200 500
L0 113.47 116.63 117.36 117.49 117.54 117.56 117.59 117.59 117.60

XFGrarl.xpl


The true value is 117.59570 (obtainable via a very large r or by using the
quadrature methods with a suitable large number of abscissas). The computa-
tion of the average delay (AD) requires more extensive calculations. For details
see, e. g., Knoth (1998) on CUSUM for Erlang distributed data. Here we apply
the Markov chain approach again, Crosier (1986). Given one of the considered
schemes and normally distributed data, the matrix Q is primitive, i. e. there
exists a power of Q which is positive. Then Q has one single eigenvalue which
is larger in magnitude than the remaining eigenvalues. Denote this eigenvalue
by . The corresponding left eigenvector ψ is strictly positive, i. e.

ψQ = ψ , ψ > 0. (11.10)

It can be shown, Knoth (1998), that the conditional density f (·|L ≥ m) of
both the continuous statistic Zt and the Markov chain Mt tends for m ’
∞ to the normalized left eigenfunction and eigenvector, respectively, which
correspond to the dominant eigenvalue . Therefore, the approximation of
D = lim Em (L ’ m + 1|L ≥ m) can be constructed by
m’∞

D = (ψ T L)/(ψ T 1) .

Note, that the left eigenvector ψ is computed for the in-control mean µ0 , while
the ARL vector L is computed for a speci¬c out-of-control mean or µ0 again.
If we replace in the above quantlet ( XFGrarl.xpl) the phrase arl by ad, then
246 11 Statistical Process Control


we obtain the following output which demonstrates the e¬ect of the parameter
r again.

r 5 10 20 30 40 50 100 200 500
D0 110.87 114.00 114.72 114.85 114.90 114.92 114.94 114.95 114.95

XFGrad.xpl


Fortunately, for smaller values of r than in the ARL case we get good accuracy
already. Note, that in case of cusum2 the value r has to be smaller (less than 30)
than for the other charts, since it is based on the computation of the dominant
eigenvalue of a very large matrix. The approximation in case of combination of
two one-sided schemes needs a twodimensional approximating Markov chain.
For the ARL only exists a more suitable approach. As, e. g., Lucas and Crosier
(1982) shown it is possible to use the following relation between the ARLs of the
one- and the two-sided schemes. Here, the two-sided scheme is a combination
of two symmetric one-sided schemes which both start at z0 = 0. Therefore,
we get a very simple formula for the ARL L of the two-sided scheme and the
ARLs Lupper and Llower of the upper and lower one-sided CUSUM scheme

Lupper · Llower
L= . (11.11)
Lupper + Llower

Eventually, we consider the distribution function of the run length L itself.
By using the Markov chain approach and denoting with pn the approximated
i
probability of (L > n) for a SPC scheme started in i w, such that pn = (pn ),
i
we obtain
pn = pn’1 Q = p0 Qn . (11.12)
The vector p0 is initialized with p0 = 1 for the starting point z0 ∈ [i w ’
i
0
w/2, i w + w/2] and pj = 0 otherwise. For large n we can replace the above
equation by
pn ≈ gi n . (11.13)
i

The constant gi is de¬ned as

gi = φi /(φT ψ) ,

where φ denotes the right eigenvector of Q, i. e. Q φ = φ. Based on (11.12)
and (11.13) the probability mass and the cumulative distribution function of
11.2 Chart characteristics 247


the run length L can be approximated. (11.12) is used up to a certain n. If the
di¬erence between (11.12) and (11.13) is smaller than 10’9 , then exclusively
(11.13) is exploited. Remark, that the same is valid as for the AD. For the
two-sided CUSUM scheme (cusum2) the parameter r has to be small (¤ 30).


11.2.1 Average Run Length and Critical Values

The spc quantlib provides the quantlets spcewma1arl,...,spccusumCarl for
computing the ARL of the corresponding SPC scheme. All routines need the
actual value of µ as a scalar or as a vector of several µ, two scheme param-
eters, and the integer r (see the beginning of the section). The XploRe ex-
ample XFGarl.xpl demonstrates all ...arl routines for k = 0.5, » = 0.1,
zreflect= ’4, r = 50, c = 3, in-control and out-of-control means µ0 = 0 and
µ1 = 1, respectively. The next table summarizes the ARL results

chart ewma1 ewma2 cusum1 cusum2 cusumC
L0 1694.0 838.30 117.56 58.780 76.748
L1 11.386 11.386 6.4044 6.4036 6.4716

XFGarl.xpl
Remember that the ARL of the two-sided CUSUM (cusum2) scheme is based on
the one-sided one, i. e. 58.78 = 117.56/2 and 6.4036 = (6.4044·49716)/(6.4044+
49716) with 49716 = L’1 .
For the setup of the SPC scheme it is usual to give the design parameter » and
k for EWMA and CUSUM, respectively, and a value ξ for the in-control ARL.
Then, the critical value c (c2 or c3 ) is the solution of the equation Lµ0 (c) =
ξ. Here, the regula falsi is used with an accuracy of |Lµ0 (c) ’ ξ| < 0.001.
The quantlet XFGc.xpl demonstrates the computation of the critical values
for SPC schemes with in-control ARLs of ξ = 300, reference value k = 0.5
(CUSUM), smoothing parameter » = 0.1 (EWMA), zreflect= ’4, and the
Markov chain parameter r = 50.

chart ewma1 ewma2 cusum1 cusum2 cusumC
c 2.3081 2.6203 3.8929 4.5695 4.288

XFGc.xpl
248 11 Statistical Process Control


The parameter r = 50 guarantees fast computation and suitable accuracy.
Depending on the power of the computer one can try values of r up to 1000 or
larger (see XFGrarl.xpl in the beginning of the section).


11.2.2 Average Delay

The usage of the routines for computing the Average Delay (AD) is similar
to the ARL routines. Replace only the code arl by ad. Be aware that the
computing time is larger than in case of the ARL, because of the computation
of the dominant eigenvalue. It would be better to choose smaller r, especially in
the case of the two-sided CUSUM. Unfortunately, there is no relation between
the one- and two-sided schemes as for the ARL in (11.11). Therefore, the library
computes the AD for the two-sided CUSUM based on a twodimensional Markov
chain with dimension (r + 1)2 — (r + 1)2 . Thus with values of r larger than 30,
the computing time becomes quite large. Here the results follow for the above
quantlet XFGrarl.xpl with ad instead of arl and r = 30 for spccusum2ad:


chart ewma1 ewma2 cusum1 cusum2 cusumC
D0 1685.8 829.83 114.92 56.047 74.495
D1 11.204 11.168 5.8533 5.8346 6.2858


XFGad.xpl




11.2.3 Probability Mass and Cumulative Distribution
Function

The computation of the probability mass function (PMF) and of the cumulative
distribution function (CDF) is implemented in two di¬erent types of routines.
The ¬rst one with the syntax spcchartpmf returns the values of the PMF
P (L = n) and CDF P (L ¤ n) at given single points of n, where chart has
to be replaced by ewma1, ..., cusumC. The second one written as spcchartpmfm
computes the whole vectors of the PMF and of the CDF up to a given point
n, i. e. P (L = 1), P (L = 2), . . . , P (L = n) and the similar one of the CDF.
Note, that the same is valid as for the Average Delay (AD). In case of the
two-sided CUSUM scheme the computations are based on a twodimensional
11.2 Chart characteristics 249


Markov chain. A value of parameter r less than 30 would be computing time
friendly.
With the quantlet XFGpmf1.xpl the 5 di¬erent schemes (r = 50, for cusum2
r = 25) are compared according their in-control PMF and CDF (µ = µ0 = 0) at
the positions n in {1, 10, 20, 30, 50, 100, 200, 300}. Remark, that the in-control
ARL of all schemes is chosen as 300.


chart ewma1 ewma2 cusum1 cusum2 cusumC
6 · 10’8 2 · 10’9 6 · 10’6 4 · 10’7 2 · 10’6
P (L = 1)
P (L = 10) 0.00318 0.00272 0.00321 0.00307 0.00320
P (L = 20) 0.00332 0.00324 0.00321 0.00325 0.00322
P (L = 30) 0.00315 0.00316 0.00310 0.00314 0.00311
P (L = 50) 0.00292 0.00296 0.00290 0.00294 0.00290
P (L = 100) 0.00246 0.00249 0.00245 0.00248 0.00245
P (L = 200) 0.00175 0.00177 0.00175 0.00176 0.00175
P (L = 300) 0.00125 0.00126 0.00124 0.00125 0.00125
6 · 10’8 2 · 10’9 6 · 10’6 4 · 10’7 2 · 10’6
P (L = 1)
P (L ¤ 10) 0.01663 0.01233 0.02012 0.01675 0.01958
P (L ¤ 20) 0.05005 0.04372 0.05254 0.04916 0.05202
P (L ¤ 30) 0.08228 0.07576 0.08407 0.08109 0.08358
P (L ¤ 50) 0.14269 0.13683 0.14402 0.14179 0.14360
P (L ¤ 100) 0.27642 0.27242 0.27728 0.27658 0.27700
P (L ¤ 200) 0.48452 0.48306 0.48480 0.48597 0.48470
P (L ¤ 300) 0.63277 0.63272 0.63272 0.63476 0.63273


XFGpmf1.xpl


A more appropriate, graphical representation provides the quantlet
XFGpmf2.xpl. Figure 11.4 shows the corresponding graphs.
250 11 Statistical Process Control



run length cdf - in control

15
P(L<=n)*E-2
10
5
0




0 10 20 30 40 50
n


run length cdf - out of control
1
P(L<=n)
0.5




ewma2
cusumC
0




0 10 20 30 40 50
n
Figure 11.4. CDF for two-sided EWMA and Crosier™s CUSUM for
µ = 0 (in control) and µ = 1 (out of control)
XFGpmf2.xpl
11.3 Comparison with existing methods 251


11.3 Comparison with existing methods

11.3.1 Two-sided EWMA and Lucas/Saccucci

Here, we compare the ARL and AD computations of Lucas and Saccucci (1990)
with XploRe results. In their paper they use as in-control ARL ξ = 500. Then
for, e. g., » = 0.5 and » = 0.1 the critical values are 3.071 and 2.814, respec-
tively. By using XploRe the related values are 3.0712 and 2.8144, respectively.
It is known, that the smaller » the worse the accuracy of the Markov chain
approach. Therefore, r is set greater for » = 0.1 (r = 200) than for » = 0.5
(r = 50). Table 11.1 shows some results of Lucas and Saccucci (1990) on
ARLs and ADs. Their results are based on the Markov chain approach as
well. However, they used some smaller matrix dimension and ¬tted a regres-
sion model on r (see Subsection 11.3.2). The corresponding XploRe results

µ 0 0.25 0.5 0.75 1 1.5 2 3 4 5
» = 0.5
Lµ 500 255 88.8 35.9 17.5 6.53 3.63 1.93 1.34 1.07
Dµ 499 254 88.4 35.7 17.3 6.44 3.58 1.91 1.36 1.10
» = 0.1
Lµ 500 106 31.3 15.9 10.3 6.09 4.36 2.87 2.19 1.94
Dµ 492 104 30.6 15.5 10.1 5.99 4.31 2.85 2.20 1.83

Table 11.1. ARL and AD values from Table 3 of Lucas and Saccucci
(1990)

by using the quantlet XFGlucsac.xpl coincide with the values of Lucas and
Saccucci (1990).

XFGlucsac.xpl




11.3.2 Two-sided CUSUM and Crosier

Crosier (1986) derived a new two-sided CUSUM scheme and compared it with
the established combination of two one-sided schemes. Recall Table 3 of Crosier
(1986), where the ARLs of the new and the old scheme were presented. The
reference value k is equal to 0.5. First, we compare the critical values. By
252 11 Statistical Process Control


µ 0 0.25 0.5 0.75 1 1.5 2 2.5 3 4 5
old scheme, h = 4
Lµ 168 74.2 26.6 13.3 8.38 4.74 3.34 2.62 2.19 1.71 1.31
new scheme, h = 3.73
Lµ 168 70.7 25.1 12.5 7.92 4.49 3.17 2.49 2.09 1.60 1.22
old scheme, h = 5
Lµ 465 139 38.0 17.0 10.4 5.75 4.01 3.11 2.57 2.01 1.69
new scheme, h = 4.713
Lµ 465 132 35.9 16.2 9.87 5.47 3.82 2.97 2.46 1.94 1.59

Table 11.2. ARLs from Table 3 of Crosier (1986)


using XploRe ( XFGcrosc.xpl) with r = 100 one gets c = 4.0021 (4), 3.7304
(3.73), 4.9997 (5), 4.7133 (4.713), respectively “ the original values of Crosier
are written in parentheses. By comparing the results of Table 11.2 with the
results obtainable by the quantlet XFGcrosarl.xpl (r = 100) it turns out,
that again the ARL values coincide with one exception only, namely L1.5 = 4.75
for the old scheme with h = 4.

XFGcrosarl.xpl


Further, we want to compare the results for the Average Delay (AD), which is
called Steady-State ARL in Crosier (1986). In Table 5 of Crosier we ¬nd the
related results. A slight modi¬cation of the above quantlet XFGcrosarl.xpl

µ 0 0.25 0.5 0.75 1 1.5 2 2.5 3 4 5
old scheme, h = 4
Lµ 163 71.6 25.2 12.3 7.68 4.31 3.03 2.38 2.00 1.55 1.22
new scheme, h = 3.73
Lµ 164 69.0 24.3 12.1 7.69 4.39 3.12 2.46 2.07 1.60 1.29
old scheme, h = 5
Lµ 459 136 36.4 16.0 9.62 5.28 3.68 2.86 2.38 1.86 1.53
new scheme, h = 4.713
Lµ 460 130 35.1 15.8 9.62 5.36 3.77 2.95 2.45 1.91 1.57

Table 11.3. ADs (steady-state ARLs) from Table 5 of Crosier (1986)
11.4 Real data example “ monitoring CAPM 253


allows to compute the ADs. Remember, that the computation of the AD for
the two-sided CUSUM scheme is based on a twodimensional Markov chain.
Therefore the parameter r is set to 25 for the scheme called old scheme by
Crosier. The results are summarized in Table 11.4.

µ 0 0.25 0.5 0.75 1 1.5 2 2.5 3 4 5
old scheme, h = 4
Lµ 163 71.6 25.2 12.4 7.72 4.33 3.05 2.39 2.01 1.55 1.22
new scheme, h = 3.73
Lµ 165 69.1 24.4 12.2 7.70 4.40 3.12 2.47 2.07 1.60 1.29
old scheme, h = 5
Lµ 455 136 36.4 16.0 9.65 5.30 3.69 2.87 2.38 1.86 1.54
new scheme, h = 4.713
Lµ 460 130 35.1 15.8 9.63 5.37 3.77 2.95 2.45 1.91 1.57

Table 11.4. ADs (steady-state ARLs) computed by XploRe, di¬erent
values to Table 11.3 are printed as italics XFGcrosad.xpl


While the ARL values in the paper and computed by XploRe coincide, those
for the AD di¬er slightly. The most prominent deviation (459 vs. 455) one
observes for the old scheme with h = 5. One further in-control ARL di¬erence
one notices for the new scheme with h = 3.73. All other di¬erences are small.
There are di¬erent sources for the deviations:
T T
1. Crosier computed D(32) = (p32 L)/(p32 1) and not the actual limit D
(see 11.8, 11.10, and 11.12).
2. Crosier used ARL(r) = ARL∞ + B/r2 + C/r4 and ¬tted this model
for r = 8, 9, 10, 12, 15. Then, ARL∞ is used as ¬nal approximation. In
order to get the above D(32) one needs the whole vector L, such that this
approach might be more sensitive to approximation errors than in the
single ARL case.


11.4 Real data example “ monitoring CAPM
There are di¬erent ways of applying SPC to ¬nancial data. Here, we use a
twosided EWMA chart for monitoring the Deutsche Bank (DBK) share. More
254 11 Statistical Process Control


precisely, a capital asset pricing model (CAPM) is ¬tted for DBK and the DAX
which is used as proxy of the e¬cient market portfolio. That is, denoting with
rDAX,t and rDBK,t the log returns of the DAX and the DBK, respectively, one
assumes that the following regression model is valid:

rDBK,t = ± + β rDAX,t + µt (11.14)

Usually, the parameters of the model are estimated by the ordinary least
squares method. The parameter β is a very popular measure in applied ¬-
nance, Elton and Gruber (1991). In order to construct a real portfolio, the
β coe¬cient is frequently taken into account. Research has therefore concen-
trated on the appropriate estimation of constant and time changing β. In the
context of SPC it is therefore useful to construct monitoring rules which signal
changes in β. Contrary to standard SPC application in industry there is no
obvious state of the process which one can call ”in-control”, i. e. there is no
target process. Therefore, pre-run time series of both quotes (DBK, DAX)
are exploited for building the in-control state. The daily quotes and log re-
turns, respectively, from january, 6th, 1995 to march, 18th, 1997 (about 450
observations) are used for ¬tting (11.14):

ANOVA SS df MSS F-test P-value
_________________________________________________________________________
Regression 0.025 1 0.025 445.686 0.0000
Residuals 0.025 448 0.000
Total Variation 0.050 449 0.000

Multiple R = 0.70619
R^2 = 0.49871
Adjusted R^2 = 0.49759
Standard Error = 0.00746

PARAMETERS Beta SE StandB t-test P-value
________________________________________________________________________
b[ 0,]= -0.0003 0.0004 -0.0000 -0.789 0.4307
b[ 1,]= 0.8838 0.0419 0.7062 21.111 0.0000


With b[1,] = β = 0.8838 a typical value has been obtained. The R2 =
0.49871 is not very large. However, the simple linear regression is considered
in the sequel. The (empirical) residuals of the above model are correlated (see
Figure 11.5). The SPC application should therefore be performed with the
(standardized) residuals of an AR(1) ¬t to the regression residuals. For an
application of the XploRe quantlet armacls (quantlib times) the regression
residuals were standardized. By using the conditional least squares method an
11.4 Real data example “ monitoring CAPM 255


Sample partial autocorrelation function (pacf)
20
15
10
pacf*E-2
5
0
-5
-10




5 10 15 20
lag
Figure 11.5. Partial autocorrelation function of CAPM regression resid-
uals


estimate of ˆ = 0.20103 for the AR(1) model

µt = µt’1 + ·t (11.15)

has been obtained. Eventually, by plugging in the estimates of ±, β and ,
and standardizing with the sample standard deviation of the pre-run residuals
series (see (11.15)) one gets a series of uncorrelated data with expectation 0 and
variance 1, i. e. our in-control state. If the ¬tted model (CAPM with AR(1)
noise) remains valid after the pre-run, the related standardized residuals behave
like in the in-control state. Now, the application of SPC, more precisely of a
twosided EWMA chart, allows to monitor the series in order to get signals, if
the original model was changed. Changes in ± or β in (11.14) or in in (11.15)
or in the residual variance of both models lead to shifts or scale changes in the
empirical residuals series. Hence, the probability of an alarm signaled by the
EWMA chart increases (with one exception only “ decreased variances). In
this way a possible user of SPC in ¬nance is able to monitor an estimated and
presumed CAPM.
256 11 Statistical Process Control


In our example we use the parameter » = 0.2 and an in-control ARL of 500,
such that the critical value is equal to c = 2.9623 (the Markov chain parameter
r was set to 100). Remark, that the computation of c is based on the normality
assumption, which is seldom ful¬lled for ¬nancial data. In our example the
hypothesis of normality is rejected as well with a very small p value (Jarque-
Bera test with quantlet jarber). The estimates of skewness 0.136805 and
kurtosis 6.64844 contradict normality too. The fat tails of the distribution are
a typical pattern of ¬nancial data. Usually, the fat tails lead to a higher false
alarm rate. However, it would be much more complicated to ¬t an appropriate
distribution to the residuals and use these results for the ”correct” critical
value.
The Figures 11.6 and 11.7 present the EWMA graphs of the pre-run and the
monitoring period (from march, 19th, 1997 to april, 16th, 1999). In the pre-run

Twosided EWMA chart
76 lambda = 0.20, in-control ARL = 499
1
0.5
Z_t
0
-0.5
-1




0 1 2 3 4
t*E2

Figure 11.6. Twosided EWMA chart of the standardized CAPM-AR(1)
residuals for the pre-run period (06/01/95 - 03/18/97)


period the EWMA chart signals 4 times. The ¬rst 3 alarms seem to be outliers,
while the last points on a longer change. Nevertheless, the chart performs quite
typical for the pre-run period. The ¬rst signal in the monitoring period was
obtained at the 64th observation (i. e. 06/24/97). Then, we observe more
frequently signals than in the pre-run period, the changes are more persistent
and so one has to assume, that the pre-run model is no longer valid. A new
CAPM has therefore to be ¬tted and, if necessary, the considered portfolio has
to be reweighted. Naturally, a new pre-run can be used for the new monitoring
11.4 Real data example “ monitoring CAPM 257


Twosided EWMA chart
64 lambda = 0.20, in-control ARL = 499
2
1
Z_t
0
-1
-2




0 1 2 3 4 5
t*E2

Figure 11.7. Twosided EWMA chart of the standardized CAPM-AR(1)
residuals for the monitoring period (03/19/97 - 04/16/99)



period.
XFGcapmar1.xpl




Bibliography
Brook, D. and Evans, D. A. (1972). An approach to the probability distribution
of cusum run length, Biometrika 59: 539“548.
Crosier, R. B. (1986). A new two-sided cumulative quality control scheme,
Technometrics 28: 187“194.

Crowder, S. V. (1986). A simple method for studying run-length distributions
of exponentially weighted moving average charts, Technometrics 29: 401“
407.
Elton, EJ. and Gruber, MJ. (1991). Modern portfolio theory and investment
analysis, Wiley, 4. edition.
Knoth, S. (1998). Quasi-stationarity of CUSUM schemes for Erlang distribu-
tions, Metrika 48: 31“48.
258 11 Statistical Process Control


Lorden, G. (1971). Procedures for reacting to a change in distribution, Annals
of Mathematical Statistics 42: 1897“1908.

Lucas, J. M. (1976). The design and use of V-mask control schemes, Journal
of Quality Technology 8: 1“12.
Lucas, J. M. and Crosier, R. B. (1982). Fast initial response for cusum quality-
control schemes: Give your cusum a headstart, Technometrics 24: 199“
205.
Lucas, J. M. and Saccucci, M. S. (1990). Exponentially weighted moving aver-
age control schemes: properties and enhancements, Technometrics 32: 1“
12.
Page, E. S. (1954). Continuous inspection schemes, Biometrika 41: 100“115.
Pollak, M. and Siegmund, D. (1975). Approximations to the expected sample
size of certain sequential tests, Annals of Statistics 3: 1267“1282.
Pollak, M. and Siegmund, D. (1985). A di¬usion process and its applications to
detection a change in the drift of brownian motion, Biometrika 72: 267“
280.
Roberts, S. W. (1959). Control-charts-tests based on geometric moving aver-
ages, Technometrics 1: 239“250.
Shewhart, W. A. (1931). Economic Control of Quality of Manufactured Product,
D. van Nostrand Company, Inc., Toronto, Canada.
Vance, L. (1986). ARL of cumulative sum control chart for controlling normal
mean, Journal of Quality Technology 18: 189“193.
Waldmann, K.-H. (1986). Bounds to the distribution of the run length in
general quality“control schemes, Statistische Hefte 27: 37“56.
12 An Empirical Likelihood
Goodness-of-Fit Test for
Di¬usions
Song Xi Chen, Wolfgang H¨rdle and Torsten Kleinow
a


The analysis and prediction of di¬usion processes plays a fundamental role
in the statistical analysis of ¬nancial markets. The techniques applied rely
on the actual model assumed for the drift and di¬usion coe¬cient functions.
Mismodelling these coe¬cients might result in biased prediction and incorrect
parameter speci¬cation. We show in this chapter how the empirical likelihood
technique, Owen (1988) and Owen (1990), may be used to construct test pro-
cedures for the Goodness-of-Fit of a di¬usion model. The technique is based
on comparison with kernel smoothing estimators. The Goodness-of-Fit test
proposed is based on the asymptotics of the empirical likelihood, which has
two attractive features. One is its automatic consideration of the variation as-
sociated with the nonparametric ¬t due to the empirical likelihood™s ability to
studentize internally. The other one is that the asymptotic distributions of the
test statistic are free of unknown parameters which avoids secondary plug-in
estimation.


12.1 Introduction
Let us assume a strictly stationary one-dimensional di¬usion Z solving the
stochastic di¬erential equation (SDE)

dZ(t) = m{Z(t)}dt + v{Z(t)}dW (t) (12.1)

where the driving process W = {W (t), t ∈ [0, ∞)} in (12.1) is a standard
Wiener process. In a mathematical ¬nance setting, Z might be the price process
260 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions


of a stock, a stock market index or any other observable process. For the rest
of the chapter the drift m : R ’ R, and the di¬usion coe¬cient v : R ’ [0, ∞)
in (12.1) are assumed to be su¬ciently smooth, so that a unique solution of
(12.1) exists.
In applications we are mostly interested in the stationary solutions of (12.1).
For the existence of a stationary solution, the drift and the di¬usion coe¬cient
must satisfy some conditions, Bibby and Sørensen (1995). The most important
condition is that the stationary forward Kolmogorov equation

(1/2) v 2 (z)p(z) ’ m(z)p(z) = 0

has a solution p(z) which is a probability density. If the initial value Z(0) is
distributed in accordance with p0 , and if it is independent of the Wiener process
W (t) in (12.1), then (12.1) de¬nes a stationary process. The above condition
holds for the Ornstein-Uhlenbeck process with a normal stationary distribution,
and for the Cox-Ingersoll-Ross process with a “-distribution. For the statistical
analysis we assume that Z is observed at discrete times ti = i∆, i = 1, 2, . . . , n,
with a time step size ∆ > 0. From these observations we get a time series Z ∆
with certain dynamics speci¬ed in Section 12.2.
The aim of this chapter is to test a parametric model for the drift function m
against a nonparametric alternative, i.e.

H0 (m) : m(z) = mθ (z) (12.2)

where θ is an unknown parameter. The test statistic we apply is based on
the empirical likelihood. This concept was introduced by Chen, H¨rdle and
a
Kleinow (2001) for time series. To apply it in our situation we start with the
discretization of the di¬usion process Z.


12.2 Discrete Time Approximation of a Di¬usion
Let us assume that the di¬usion process Z is observed at discrete times ti =
i∆, i = 1, 2, . . . , n, with a time step size ∆ > 0. Here we suppose that ∆ is
small or, more precisely, will tend to zero asymptotically. Under rather weak
assumptions, see Kloeden and Platen (1999), on the functions m and v 2 , it can
be shown that the Euler approximation
t t
∆ ∆ ∆
v Z ∆ (tis ) dW (s)
Z (t) = Z (0) + m Z (tis ) ds + (12.3)
0 0
12.3 Hypothesis Testing 261


with tis = max{ti , ti ¤ s}, converges in a mean square sense to Z as ∆ ’ 0,
i.e.,
lim E( sup |Z ∆ (t) ’ Z(t)|2 ) = 0, T > 0. (12.4)
∆’0 0¤t¤T


From now on, we assume that a discrete time approximation Z ∆ exists in the
form of (12.3), and that the property (12.4) holds. For the purposes of this
chapter, ∆ will always be considered small enough that one can substitute Z
by Z ∆ in our interpretation of the observed data. The increments of the Euler
approximation and so the observed data will have the form
Z ∆ (ti+1 ) ’ Z ∆ (ti ) = m Z ∆ (ti ) ∆ + v Z ∆ (ti ) W (ti+1 ) ’ W (ti ) (12.5)
for i = 0, 1, . . . .. The observations {Z ∆ (ti )}, i = 0, 1, . . . n form a time series.
As long as the step size ∆ is small enough the concrete choice of ∆ does not
matter since all the relevant information about the model is contained in the
drift m and di¬usion coe¬cient v.
For the following we introduce the notation
def def
Z ∆ (ti ) ,
Xi = X = (X1 , . . . , Xn )
def def
W (ti+1 ) ’ W (ti ) ,
µi = µ = (µ1 , . . . , µn )
def def
Xi+1 ’ Xi = m Xi ∆ + v Xi µi ,
Yi = Y = (Y1 , . . . , Yn )(12.6)

We can now apply the empirical likelihood Goodness-of-Fit test for stationary
time series developed by Chen et al. (2001).


12.3 Hypothesis Testing
Suppose (X, Y ) is de¬ned as in (12.6) and let m(x) = E(Y |X = x) be the
conditional mean function, f be the density of the design points X, and σ 2 (x) =
Var(Y |X = x) be the conditional variance function of Y given X = x ∈ S, a
closed interval S ‚ R. Suppose that {mθ |θ ∈ ˜} is a parametric model for the
ˆ
mean function m and that θ is an estimator of θ under this parametric model.
The interest is to test the null hypothesis:
for all x ∈ S
H0 : m(x) = mθ (x)
against a series of local smooth nonparametric alternatives:
H1 : m(x) = mθ (x) + cn ∆n (x),
262 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions


where cn is a non-random sequence tending to zero as n ’ ∞ and ∆n (x) is a
sequence of bounded functions.
The problem of testing against a nonparametric alternative is not new for an
independent and identically distributed setting, H¨rdle and Mammen (1993)
a
and Hart (1997). In a time series context the testing procedure has only been
considered by Kreiß, Neumann and Yao (1998) as far as we are aware. Also
theoretical results on kernel estimators for time series appeared only very re-
cently, Bosq (1998). This is surprising given the interests in time series for
¬nancial engineering.
We require a few assumptions to establish the results in this chapter. These
assumptions are the following:

(i) The kernel K is Lipschitz continuous in [’1, 1], that is |K(t1 ) ’ K(t2 )| ¤
C||t1 ’ t2 || where || · || is the Euclidean norm, and h = O{n’1/5 };
(ii) f , m and σ 2 have continuous derivatives up to the second order in S.
ˆ
(iii) θ is a parametric estimator of θ within the family of the parametric
model, and
sup |mθ (x) ’ mθ (x)| = Op (n’1/2 ).
ˆ
x∈S

(iv) ∆n (x), the local shift in the alternative H1 , is uniformly bounded with
respect to x and n, and cn = n’1/2 h’1/4 which is the order of the di¬er-
ence between H0 and H1 .
(v) The process {(Xi , Yi )} is strictly stationary and ±-mixing, i.e.
def
|P(AB) ’ P(A)P(B)| ¤ aρk
±(k) = sup

i
A∈F1 ,B∈Fi+k

l
for some a > 0 and ρ ∈ [0, 1). Here Fk denotes the σ-algebra of events
generated by {(Xi , Yi ), k ¤ i ¤ l} for l ≥ k. For an introduction into
±-mixing processes, see Bosq (1998) or Billingsley (1999). As shown by
Genon-Catalot, Jeantheau and Lar´do (2000) this assumption is ful¬lled
e
if Zt is an ±-mixing process.
(vi) E{exp(a0 |Y1 ’ m(X1 )|)} < ∞ for some a0 > 0; The conditional density
of X given Y and the joint conditional density of (X1 , Xl ) given (Y1 , Yl )
are bounded for all l > 1.
12.4 Kernel Estimator 263


Assumptions (i) and (ii) are standard in nonparametric curve estimation and
are satis¬ed for example for bandwidths selected by cross validation, whereas
(iii) and (iv) are common in nonparametric Goodness-of-Fit tests. Assumption
(v) means the data are weakly dependent. It is satis¬ed for a wide class of
di¬usion processes.


12.4 Kernel Estimator
To develop a test about H0 we ¬rst introduce a nonparametric kernel estimator
for m. For an introduction into kernel estimation see H¨rdle (1990), Wand and
a
Jones (1995) and (H¨rdle, M¨ller, Sperlich and Werwatz, 2000). Without loss
a u
of generality we assume that we are only interested in m(x) for x ∈ [0, 1] and
that f (x) ≥ C1 ∀x ∈ [0, 1] with a positive constant C1 . If in a particular
problem the data are supported by another closed interval, this problem can
be transformed by rescaling into an equivalent problem with data support [0, 1].
Let K be a bounded probability density function with a compact support on
[’1, 1] that satis¬es the moment conditions:

u2 K(u)du = σK
2
uK(u)du = 0,

2
where σK is a positive constant. Let h be a positive smoothing bandwidth
which will be used to smooth (X, Y ).
The nonparametric estimator considered is the Nadaraya-Watson (NW) esti-
mator n
Yi Kh (x ’ Xi )
m(x) = i=1
ˆ (12.7)
n
i=1 Kh (x ’ Xi )

with Kh (u) = h’1 K(h’1 u). This estimator is calculated in XploRe by the
quantlets regest or regxest.
The parameter estimation of θ depends on the null hypothesis. We assume

here, that the parameter θ is estimated by a n-consistent estimator. Let

Kh (x ’ Xi )mθ (Xi )
ˆ
mθ (x) =
˜ˆ n
i=1 Kh (x ’ Xi )

be the smoothed parametric model. The test statistic we are going to consider
is based on the di¬erence between mθ and m, rather than directly between m
˜ˆ ˆ ˆ
264 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions


and mθ , in order to avoid the issue of bias associated with the nonparametric
ˆ
¬t.
The local linear estimator can be used to replace the NW estimator in estimat-
ing m. However, as we compare m with mθ in formulating the Goodness-of-Fit
ˆ ˜ˆ
test, the possible bias associated with the NW estimator is not an issue here.
In addition, the NW estimator has a simpler analytic form.


12.5 The Empirical Likelihood concept

12.5.1 Introduction into Empirical Likelihood

Let us now as in Owen (1988) and Owen (1990) introduce the empirical likeli-
hood (EL) concept. Suppose a sample (U1 , . . . , Un ) of independent identically
distributed random variables in R1 according to a probability law with un-
known distribution function F and unknown density f . For an observation
(u1 , . . . , un ) of (U1 , . . . , Un ) the likelihood function is given by
n
¯
L(f ) = f (ui ) (12.8)
i=1

The empirical density calculated from the observations (u1 , . . . , un ) is
n
1
def
fn (u) = 1{ui = u} (12.9)
n i=1

where 1 denotes the indicator function. It is easy to see that fn maximizes
¯
L(f ) in the class of all probability density functions.
The objective of the empirical likelihood concept is the construction of tests and
con¬dence intervals for a parameter θ = θ(F ) of the distribution of Ui . To keep
things simple we illustrate the empirical likelihood method for the expectation
E[Ui ]. The null hypothesis is E[Ui ] = θ. We can test this assumption based on
the empirical likelihood ratio
¯
L{f (θ)}
def
R(F ) = (12.10)
¯
L(fn )
¯
where f (θ) maximizes L(f ) subject to

Ui dF = θ. (12.11)
12.5 The Empirical Likelihood concept 265


On a heuristic level we can reject the null hypothesis “under the true distribu-
tion F , U has expectation θ” if the ratio R(F ) is small relative to 1, i.e. the
test rejects if R(F ) < r for a certain level r ∈ (0, 1). More precisely, Owen
(1990) proves the following

THEOREM 12.1 Let (U1 , . . . , Un ) be iid one-dimensional random variables
with expectation θ and variance σ 2 . For a positive r < 1 let

Fn , R(F ) ≥ r
Cr,n = Ui dF F

be the set of all possible expectations of U with respect to distributions F dom-
inated by Fn (F Fn ). Then it follows
lim P[θ ∈ Cr,n ] = P[χ2 ¤ ’2 log r] (12.12)
n’∞

where χ2 is a χ2 -distributed random variable with one degree of freedom.

From Theorem 12.1 it follows directly

¤ r EUi = θ = P[χ2 ¤ r]
lim P ’ 2 log max R(F )
n’∞ {F |F Fn , Ui dF =θ}

This result suggests therefore to use the log-EL ratio
¯
L{f (θ)}
’2 log = ’2 log
max R(F ) max ¯
Ui dF =θ} L(fn )
{F |F {F |F
Fn , Ui dF =θ} Fn ,

as the basic element of a test about a parametric hypothesis for the drift func-
tion of a di¬usion process.


12.5.2 Empirical Likelihood for Time Series Data

We will now expand the results in Section 12.5.1 to the case of time series data.
For an arbitrary x ∈ [0, 1] and any function µ we have
x ’ Xi
{Yi ’ µ(x)} E[Yi |Xi = x] = µ(x) = 0.
EK (12.13)
h
Let pi (x) be nonnegative numbers representing a density for
x ’ Xi
{Yi ’ µ(x)}
K i = 1, . . . , n
h
266 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions


The empirical likelihood for µ(x) is
n
def
L{µ(x)} = max pi (x) (12.14)
i=1
n n x’Xi
{Yi ’ µ(x)} = 0. The
subject to i=1 pi (x) = 1 and pi (x)K
i=1 h
second condition re¬‚ects (12.13).
We ¬nd the maximum by introducing Lagrange multipliers and maximizing the
Lagrangian function
n
L(p, »1 , »2 ) = log pi (x)
i=1
n n
x ’ Xi
’»1 {Yi ’ µ(x)} ’ »2 pi (x) ’ 1
pi (x)K
h
i=1 i=1

The partial derivatives are
x ’ Xi
‚L(p, »1 , »2 ) 1
’ »1 K {Yi ’ µ(x)} ’ »2 ∀i = 1, . . . , n .
=
‚pi (x) pi (x) h
With » = »1 /»2 we obtain as a solution to (12.14) the optimal weights
’1
x ’ Xi
’1
{Yi ’ µ(x)}
pi (x) = n 1 + »(x)K (12.15)
h
where »(x) is the root of
n x’Xi
{Yi ’ µ(x)}
K h
= 0. (12.16)
»(x)K x’Xi {Yi ’ µ(x)}
1+ h
i=1

Note, that »2 = n follows from
n n
x ’ Xi
{Yi ’ µ(x)} = 1 .
pi (x) + » pi (x)K
h
i=1 i=1

The maximum empirical likelihood is achieved at pi (x) = n’1 corresponding
ˆ
to the nonparametric curve estimate µ(x) = m(x). For a parameter estimate θ
ˆ
we get the maximum empirical likelihood for the smoothed parametric model
L{mθ (x)}. The log-EL ratio is
˜ˆ
L{mθ (x)}
˜ˆ
def
= ’2 log[L{mθ (x)}nn ].
{mθ (x)} = ’2 log
˜ˆ ˜ˆ
L{m(x)}
ˆ
12.5 The Empirical Likelihood concept 267


To study properties of the empirical likelihood based test statistic we need to
evaluate {mθ (x)} at an arbitrary x ¬rst, which requires the following lemma
˜ˆ
on »(x) that is proved in Chen et al. (2001).

LEMMA 12.1 Under the assumptions (i)-(vi),

sup |»(x)| = Op {(nh)’1/2 log(n)}.
x∈[0,1]


Let γ(x) be a random process with x ∈ [0, 1]. Throughout this chapter we use
˜
the notation γ(x) = Op (δn ) ( Op (δn )) to denote the facts that supx∈[0,1] |γ(x)| =
˜
Op (δn ) (Op (δn )) for a sequence δn .
j
n
¯ x’Xi
Let Uj (x) = (nh)’1 {Yi ’ mθ (x)}
K ˜ˆ for j = 1, 2, . . .. An appli-
i=1 h

cation of the power series expansion of 1/(1 ’ •) applied to (12.16) and Lemma
12.1 yields

n
x ’ Xi x ’ Xi
(’»(x))j K j {Yi ’ mθ (x)}j = 0.
{Yi ’ mθ (x)}
K ˜ˆ ˜ˆ
h h
i=1 j=0

Inverting the above expansion, we have
¯ ’1 ¯
»(x) = U2 (x)U1 (x) + Op {(nh)’1 log2 (n)}.
˜ (12.17)

From (12.15), Lemma 12.1 and the Taylor expansion of log(1 + •) we get

{mθ (x)} = ’2 log[L{mθ (x)}nn ]
˜ˆ ˜ˆ
n
x ’ Xi
{Yj ’ mθ (x)}]
= 2 log[1 + »(x)K ˜ˆ
h
i=1
¯ ¯
2nh»(x)U1 ’ nh»2 (x)U2 + Op {(nh)’1/2 log3 (n)}
˜
=
(12.18)

Inserting (12.17) in (12.18) yields
¯ ’1 ¯2
{mθ (x)} = nhU2 (x)U1 (x) + Op {(nh)’1/2 log3 (n)}.
˜
˜ˆ (12.19)

For any x ∈ [0, 1], let
1 1
2
’ y)dy and b(x; h) = h Kh (x ’ y)dy
v(x; h) = h Kh (x
0 0
268 12 An Empirical Likelihood Goodness-of-Fit Test for Di¬usions


be the variance and the bias coe¬cient functions associated with the NW esti-
mator, respectively, see Wand and Jones (1995). Let
SI,h = {x ∈ [0, 1]| min (|x ’ 1|, |x|) > h}.
For h ’ 0, SI,h converges to the set of interior points in [0, 1]. If x ∈ SI,h , we
def
K 2 (x)dx and b(x; h) = 1. De¬ne
have v(x; h) =
v(x; h)σ 2 (x)
V (x; h) = .
f (x)b2 (x; h)
Clearly, V (x; h)/(nh) is the asymptotic variance of m(x) when nh ’ ∞ which
ˆ
is one of the conditions we assumed.
It was shown by Chen et al. (2001), that
n
¯ ’1
Kh (x ’ Xi ){Yi ’ mθ (x)}
U1 (x) =n ˜ˆ
i=1
n
˜
= n’1 Kh (x ’ Xi ){Yi ’ mθ (Xi )} + Op (n’1/2 )
i=1
ˆ ˜
= f (x){m(x) ’ mθ (x)} + Op (n’1/2 )
ˆ ˜
˜
= f (x)b(x; h){m(x) ’ mθ (x)} + Op {n’1/2 + (nh)’1 log2 (n)}.
ˆ ˜
¯
In the same paper it is shown, that condition (iii) entails supx∈[0,1] |U2 (x) ’
f (x)v(x; h)σ 2 (x)| = Op (h). These and (12.19) mean that
¯ ’1 ¯ 2 ˜
(nh)U2 U1 + Op {(nh)’1/2 log3 (n)}
{mθ (x)} =
˜ˆ
˜
= V ’1 (x; h){m(x) ’ mθ (x)}2 + O{(nh)’1 h log2 (n)}
ˆ ˜ (12.20)
Therefore, {mθ (x)} is asymptotically equivalent to a studentized L2 distance
˜ˆ
between mθ (x) and m(x). It is this property that leads us to use {mθ (x)} as
˜ˆ ˆ ˜ˆ
the basic building block in the construction of a global test statistic for distinc-
tion between mθ and m in the next section. The use of the empirical likelihood
˜ˆ ˆ
as a distance measure and its comparison with other distance measures have
been discussed in Owen (1991) and Baggerly (1998).


12.6 Goodness-of-Fit Statistic
To extend the empirical likelihood ratio statistic to a global measure of
Goodness-of-Fit, we choose kn -equally spaced lattice points t1 , t2 , · · · , tkn in
12.6 Goodness-of-Fit Statistic 269


[0, 1] where t1 = 0, tkn = 1 and ti ¤ tj for 1 ¤ i < j ¤ kn . We let kn ’ ∞
and kn /n ’ 0 as n ’ ∞. This essentially divides [0, 1] into kn small bins of
size (kn )’1 . A simple choice is to let kn = [1/(2h)] where [a] is the largest
integer less than a. This choice as justi¬ed later ensures asymptotic indepen-
dence among {mθ (tj )} at di¬erent tj s. Bins of di¬erent size can be adopted to
˜ˆ
suit situations where there are areas of low design density. This corresponds to
the use of di¬erent bandwidth values in adaptive kernel smoothing. The main
results of this chapter is not a¬ected by un-equal bins. For the purpose of easy
presentation, we will treat bins of equal size.
As {mθ (tj )} measures the Goodness-of-Fit at a ¬xed tj , an empirical likelihood
˜ˆ
based statistic that measures the global Goodness-of-Fit is de¬ned as
kn
def
{mθ (tj )}.
n (mθ )
˜ˆ = ˜ˆ
j=1

The following theorem was proven by Chen et al. (2001).

THEOREM 12.2 Under the assumptions (i) - (vi),

{m(x) ’ mθ (x)}2
ˆ ˜
’1 ’1
dx + Op {kn log2 (n) + h log2 (n)}
kn n (mθ )
˜ˆ = (nh)
V (x)
(12.21)
def
where V (x) = limh’0 V (x, h).

H¨rdle and Mammen (1993) proposed the L2 distance
a

Tn = nh1/2 {m(x) ’ mθ (x)}2 π(x)dx
ˆ ˜ˆ

as a measure of Goodness-of-Fit where π(x) is a given weight function.
’1

<<

. 9
( 14)



>>