ńņš. 9 |

5. The distribution of positive and negative price changes was approximately

symmetric.

Consider next the number of transactions in a 5-minute time interval. Denote the

series by xt . That is, x1 is the number of IBM transactions from 9:30 am to 9:35 am

on November 1, 1990 Eastern time, x2 is the number of transactions from 9:35 am to

9:40 am, and so on. The time gaps between trading days are ignored. Figure 5.1(a)

shows the time plot of xt , and Figure 5.1(b) the sample ACF of xt for lags 1 to 260. Of

particular interest is the cyclical pattern of the ACF with a periodicity of 78, which

is the number of 5-minute intervals in a trading day. The number of transactions

thus exhibits a daily pattern. To further illustrate the daily trading pattern, Figure 5.2

shows the average number of transactions within 5-minute time intervals over the 63

days. There are 78 such averages. The plot exhibits a āsmilingā or āUā shape, indi-

cating heavier tradings at the opening and closing of the market and thinner tradings

during the lunch hours.

Since we focus on transactions that occurred in the normal trading hours of a

trading day, there are 59,838 time intervals in the data. These intervals are called the

intraday durations between trades. For IBM stock, there were 6531 zero time inter-

vals. That is, during the normal trading hours of the 63 trading days from Novem-

ber 1, 1990 to January 31, 1991, multiple transactions in a second occurred 6531

times, which is about 10.91%. Among these multiple transactions, 1002 of them had

0 20 40 60 80 120

n(trades)

0 1000 2000 3000 4000 5000

5-minute time intervals

Series : x

0.0 0.2 0.4 0.6 0.8 1.0

ACF

0 50 100 150 200 250

Lag

Figure 5.1. IBM intraday transactions data from 11/01/90 to 1/31/91: (a) the number of trans-

actions in 5-minute time intervals, and (b) the sample ACF of the series in part(a).

25

20

average

15 10

0 10 20 30 40 50 60 70 80

5-minute time intervals

Figure 5.2. Time plot of the average number of transactions in 5-minute time intervals. There

are 78 observations, averaging over the 63 trading days from 11/01/90 to 1/31/91 for IBM

stock.

183

184 HIGH-FREQUENCY DATA

Table 5.2. Two-Way Classiļ¬cation of Price Movements in Consecutive Intraday Trades

for IBM Stock. The Price Movements Are Classiļ¬ed Into āUp,ā āUnchanged,ā and

āDown.ā The Data Span is From 11/01/90 to 1/31/91.

ith trade

(i ā’ 1)th trade ā+ā ā0ā āā’ā Margin

ā+ā 441 5498 3948 9887

ā0ā 4867 29779 5473 40119

āā’ā 4580 4841 410 9831

Margin 9888 40118 9831 59837

different prices, which is about 1.67% of the total number of intraday transactions.

Therefore, multiple transactions (i.e., zero durations) may become an issue in statis-

tical modeling of the time durations between trades.

Table 5.2 provides a two-way classiļ¬cation of price movements. Here price move-

ments are classiļ¬ed into āup,ā āunchanged,ā and ādown.ā We denote them by ā+,ā

ā0,ā and āā’,ā respectively. The table shows the price movements between two con-

secutive trades (i.e., from the [i ā’ 1]th to the ith transaction) in the sample. From the

table, trade-by-trade data show that

1. consecutive price increases or decreases are relatively rare, which are about

441/59837 = 0.74% and 410/59837 = 0.69%, respectively;

2. there is a slight edge to move from āupā to āunchangedā than to ādownā; see

row 1 of the table;

3. there is a high tendency for price to remain āunchangedā;

4. the probabilities of moving from ādownā to āupā or āunchangedā are about the

same. See row 3.

The ļ¬rst observation mentioned before is a clear demonstration of bid-ask bounce,

showing price reversals in intraday transactions data. To conļ¬rm this phenomenon,

we consider a directional series Di for price movements, where Di assumes the value

+1, 0, ā’1 for āup,ā āunchanged,ā and ādownā price movement, respectively, for the

ith transaction. The ACF of {Di } has a single spike at lag 1 with value ā’0.389, which

is highly signiļ¬cant for a sample size of 59,837 and conļ¬rms the price reversal in

consecutive trades.

As a second illustration, we consider the transactions data of IBM stock in

December 1999 obtained from the TAQ database. The normal trading hours are from

9:30 am to 4:00 pm Eastern time, except for December 31 when the market closed

at 13:00 pm. Comparing with the 1990ā“1991 data, two important changes have

occurred. First, the number of intraday tradings has increased sixfold. There were

134,120 intraday tradings in December 1999 alone. The increased trading intensity

also increased the chance of multiple transactions within a second. The percentage

of trades with zero time duration doubled to 22.98%. At the extreme, there were

185

EMPIRICAL CHARACTERISTICS

after-hour

regular

10000 8000

number of trades

6000

4000 2000

0

day

Figure 5.3. IBM transactions data for December 1999. The plot shows the number of trans-

actions in each trading day with the after-hours portion denoting the number of trades with

time stamp after 4:00 pm.

42 transactions within a given second that happened twice on December 3, 1999.

Second, the tick size of price movement was $1/16 = $0.0625 instead of $1/8.

The change in tick size should reduce the bid-ask spread. Figure 5.3 shows the daily

number of transactions in the new sample. Figure 5.4(a) shows the time plot of time

durations between trades, measured in seconds, and Figure 5.4(b) is the time plot of

price changes in consecutive intraday trades, measured in multiples of the tick size

of $1/16. As expected, Figures 5.3 and 5.4(a) show clearly the inverse relationship

between the daily number of transactions and the time interval between trades. Fig-

ure 5.4(b) shows two unusual price movements for IBM stock on December 3, 1999.

They were a drop of 63 ticks followed by an immediate jump of 64 ticks and a drop

of 68 ticks followed immediately by a jump of 68 ticks. Unusual price movements

like these occurred infrequently in intraday transactions.

Focusing on trades recorded within the regular trading hours, we have 61,149

trades out of 133,475 with no price change. This is about 45.8% and substantially

lower than that between November 1990 and January 1991. It seems that reducing

the tick size increased the chance of a price change. Table 5.3 gives the percentages

of trades associated with a price change. The price movements remain approximately

symmetric with respect to zero. Large price movements in intraday tradings are still

relatively rare.

Remark: The record keeping of high-frequency data is often not as good as that

of observations taken at lower frequencies. Data cleaning becomes a necessity in

186 HIGH-FREQUENCY DATA

(a) Intraday duration

80

20 40 60

duration

0

0 20000 40000 60000 80000 100000 120000

sequence

-20 0 20 40 60

change

-60

0 20000 40000 60000 80000 100000 120000

sequence

Figure 5.4. IBM transactions data for December 1999. Part (a) is the time plot of time dura-

tions between trades and part (b) is the time plot of price changes in consecutive trades mea-

sured in multiples of the tick size of $1/16. Only data in the normal trading hours are included.

high-frequency data analysis. For transactions data, missing observations may hap-

pen in many ways, and the accuracy of the exact transaction time might be question-

able for some trades. For example, recorded trading times may be beyond 4:00 pm

Eastern time even before the opening of after-hours tradings. How to handle these

observations deserves a careful study. A proper method of data cleaning requires a

Table 5.3. Percentages of Intraday Transactions Associated with a Price Change for IBM

Stock Traded in December 1999. The Percentage of Transactions without Price Change

Is 45.8% and the Total Number of Transactions Recorded within the Regular Trading

Hours Is 133,475. The Size Is Measured in Multiples of Tick Size $1/16.

(a) Upward movements

>7

size 1 2 3 4 5 6 7

percentage 18.03 5.80 1.79 0.66 0.25 0.15 0.09 0.32

(b) Downward movements

percentage 18.24 5.57 1.79 0.71 0.24 0.17 0.10 0.31

187

MODELS FOR PRICE CHANGES

deep understanding of the way by which the market operates. As such, it is important

to specify clearly and precisely the methods used in data cleaning. These methods

must be taken into consideration in making inference.

Again, let ti be the calendar time, measured in seconds from the midnight, when

the ith transaction took place. Let Pti be the transaction price. The price change

from the (i ā’ 1)th to the ith trade is yi ā” Pti = Pti ā’ Ptiā’1 and the time duration

is ti = ti ā’ tiā’1 . Here it is understood that the subscript i in ti and yi denotes the

time sequence of transactions, not the calendar time. In what follows, we consider

models for yi and ti both individually and jointly.

5.4 MODELS FOR PRICE CHANGES

The discreteness and concentration on āno changeā make it difļ¬cult to model the

intraday price changes. Campbell, Lo, and MacKinlay (1997) discuss several econo-

metric models that have been proposed in the literature. Here we mention two mod-

els that have the advantage of employing explanatory variables to study the intraday

price movements. The ļ¬rst model is the ordered probit model used by Hauseman,

Lo, and MacKinlay (1992) to study the price movements in transactions data. The

second model has been considered recently by McCulloch and Tsay (2000) and is a

simpliļ¬ed version of the model proposed by Rydberg and Shephard (1998); see also

Ghysels (2000).

5.4.1 Ordered Probit Model

Let yiā— be the unobservable price change of the asset under study (i.e., yiā— = Ptā— ā’

i

Ptā— ), where Ptā— is the virtual price of the asset at time t. The ordered probit model

iā’1

assumes that yiā— is a continuous random variable and follows the model

yiā— = xi Ī² + i , (5.15)

where xi is a p-dimensional row vector of explanatory variables available at time

tiā’1 , Ī² is a k Ć— 1 parameter vector, E( i | xi ) = 0, Var( i | xi ) = Ļi2 , and

Cov( i , j ) = 0 for i = j. The conditional variance Ļi2 is assumed to be a posi-

tive function of the explanatory variable wi ā”that is,

Ļi2 = g(wi ), (5.16)

where g(.) is a positive function. For ļ¬nancial transactions data, wi may contain the

time interval ti ā’ tiā’1 and some conditional heteroscedastic variables. Typically, one

also assumes that the conditional distribution of i given xi and wi is Gaussian.

Suppose that the observed price change yi may assume k possible values. In the-

ory, k can be inļ¬nity, but countable. In practice, k is ļ¬nite and may involve combin-

188 HIGH-FREQUENCY DATA

ing several categories into a single value. For example, we have k = 7 in Table 5.1,

where the ļ¬rst value āā’3 ticksā means that the price change is ā’3 ticks or lower. We

denote the k possible values as {s1 , . . . , sk }. The ordered probit model postulates the

relationship between yi and yiā— as

if Ī± jā’1 < yiā— ā¤ Ī± j ,

yi = s j j = 1, . . . , k, (5.17)

where Ī± j s are real numbers satisfying ā’ā = Ī±0 < Ī±1 < Ā· Ā· Ā· < Ī±kā’1 < Ī±k = ā.

Under the assumption of conditional Gaussian distribution, we have

P(yi = s j | xi , wi ) = P(Ī± jā’1 < xi Ī² + i ā¤ Ī± j | xi , wi )

ļ£±

ļ£² P(xi Ī² + i ā¤ Ī±1 | xi , wi ) if j = 1

= P(Ī± jā’1 < xi Ī² + i ā¤ Ī± j | xi , wi ) if j = 2, . . . , k ā’ 1

ļ£³

P(Ī±kā’1 < xi Ī² + i | xi , wi ) if j = k

ļ£±

ļ£“ Ī±1 ā’xi Ī²

if j = 1

ļ£“

ļ£“ Ļi (wi )

ļ£“

ļ£“

ļ£²

Ī± j ā’1 ā’xi Ī²

Ī± j ā’xi Ī²

= ā’ if j = 2, . . . , k ā’ 1

Ļi (wi ) Ļi (wi )

ļ£“

ļ£“

ļ£“

ļ£“

ļ£“ Ī±kā’1 ā’xi Ī²

ļ£³1 ā’ if j = k,

Ļi (wi )

(5.18)

where (x) is the cumulative distribution function of the standard normal random

variable evaluated at x, and we write Ļi (wi ) to denote that Ļi2 is a positive function

of wi . From the deļ¬nition, an ordered probit model is driven by an unobservable

continuous random variable. The observed values, which have a natural ordering,

can be regarded as categories representing the underlying process.

The ordered probit model contains parameters Ī², Ī±i (i = 1, . . . , k ā’ 1), and those

in the conditional variance function Ļi (wi ) in Eq. (5.16). These parameters can be

estimated by the maximum likelihood or Markov Chain Monte Carlo methods.

Example 5.1. Hauseman, Lo, and MacKinlay (1992) apply the ordered pro-

bit model to the 1988 transactions data of more than 100 stocks. Here we only report

their result for IBM. There are 206,794 trades. The sample mean (standard devia-

tion) of price change yi , time duration ti , and bid-ask spread are ā’0.0010(0.753),

27.21(34.13), and 1.9470(1.4625), respectively. The bid-ask spread is measured in

ticks. The model used has nine categories for price movement, and the functional

speciļ¬cations are

3 3 3

tiā—

xi Ī² = Ī²1 + Ī²v+1 yiā’v + Ī²v+4 SP5iā’v + Ī²v+7 IBSiā’v

v=1 v=1 v=1

3

+ Ī²v+10 [TĪ» (Viā’v ) Ć— IBSiā’v ] (5.19)

v=1

189

MODELS FOR PRICE CHANGES

Ļi2 (wi ) = 1.0 + Ī³1 tiā— + Ī³2 ABiā’1 ,

2 2

(5.20)

where TĪ» (V ) = (V Ī» ā’ 1)/Ī» is the Box-Cox (1964) transformation of V with Ī» ā

[0, 1] and the explanatory variables are deļ¬ned by the following:

tiā— = (ti ā’ tiā’1 )/100 is a rescaled time duration between the (i ā’ 1)th and ith

ā¢

trades with time measured in seconds.

ā¢ ABiā’1 is the bid-ask spread prevailing at time tiā’1 in ticks.

ā¢ yiā’v (v = 1, 2, 3) is the lagged value of price change at tiā’v in ticks. With

k = 9, the possible values of price changes are {ā’4, ā’3, ā’2, ā’1, 0, 1, 2, 3, 4}

in ticks.

ā¢ Viā’v (v = 1, 2, 3) is the lagged value of dollar volume at the (i ā’ v)th transac-

tion, deļ¬ned as the price of the (i ā’ v)th transaction in dollars times the number

of shares traded (denominated in hundreds of shares). That is, the dollar volume

is in hundreds of dollars.

ā¢ SP5iā’v (v = 1, 2, 3) is the 5-minute continuously compounded returns of the

Standard and Poorā™s 500 index futures price for the contract maturing in the

closest month beyond the month in which transaction (i ā’ v) occurred, where

the return is computed with the futures price recorded one minute before the

nearest round minute prior to tiā’v and the price recorded 5 minutes before this.

ā¢ IBSiā’v (v = 1, 2, 3) is an indicator variable deļ¬ned by

ļ£±

if Piā’v > (Piā’v + Piā’v )/2

a b

ļ£²1

= if Piā’v = (Piā’v + Piā’v )/2

a b

IBSiā’v 0

ļ£³

ā’1 if Piā’v < (Piā’v + Piā’v )/2,

a b

where P ja and P jb are the ask and bid price at time t j .

The parameter estimates and their t ratios are given in Table 5.4. All the t ratios

are large except one, indicating that the estimates are highly signiļ¬cant. Such high t

ratios are not surprising as the sample size is large. For the heavily traded IBM stock,

the estimation results suggest the following conclusions:

1. The boundary partitions are not equally spaced, but are almost symmetric with

respect to zero.

2. The transaction duration ti affects both the conditional mean and conditional

variance of yi in Eqs. (5.19) and (5.20).

3. The coefļ¬cients of lagged price changes are negative and highly signiļ¬cant,

indicating price reversals.

4. As expected, the bid-ask spread at time tiā’1 signiļ¬cantly affects the condi-

tional variance.

190 HIGH-FREQUENCY DATA

Table 5.4. Parameter Estimates of the Ordered-Probit Model in Eq. (5.19) and Eq. (5.20)

for the 1988 Transaction Data of IBM, Where t Denotes the t Ratio.

(a) Boundary partitions of the probit model

Ī±1 Ī±2 Ī±3 Ī±4 Ī±5 Ī±6 Ī±7 Ī±8

Par.

ā’4.67 ā’4.16 ā’3.11 ā’1.34

Est. 1.33 3.13 4.21 4.73

ā’145.7 ā’157.8 ā’171.6 ā’155.5

t 154.9 167.8 152.2 138.9

(b) Equation parameters of the probit model

tiā—

Ī³1 Ī³2 Ī²1 : Ī²2 : yā’1 Ī²3 Ī²4 Ī²5 Ī²6

Par.

ā’0.12 ā’1.01 ā’0.53 ā’0.21 ā’0.26

Est. 0.40 0.52 1.12

ā’11.4 ā’135.6 ā’85.0 ā’47.2 ā’12.1

t 15.6 71.1 54.2

Ī²7 Ī²8 Ī²9 : Ī²10 Ī²11 Ī²12 Ī²13

Par.

ā’1.14 ā’0.37 ā’0.17

Est. 0.01 0.12 0.05 0.02

ā’63.6 ā’21.6 ā’10.3

t 0.26 47.4 18.6 7.7

5.4.2 A Decomposition Model

An alternative approach to modeling price change is to decompose it into three com-

ponents and use conditional speciļ¬cations for the components; see Rydberg and

Shephard (1998). The three components are an indicator for price change, the direc-

tion of price movement if there is a change, and the size of price change if a change

occurs. Speciļ¬cally, the price change at the ith transaction can be written as

yi ā” Pti ā’ Ptiā’1 = Ai Di Si , (5.21)

where Ai is a binary variable deļ¬ned as

1 if there is a price change at the ith trade

Ai = (5.22)

0 if price remains the same at the ith trade.

Di is also a discrete variable signifying the direction of the price change if a change

occursā”that is,

1 if price increases at the ith trade

Di | (Ai = 1) = (5.23)

ā’1 if price drops at the ith trade,

where Di | (Ai = 1) means that Di is deļ¬ned under the condition of Ai = 1, and Si

is size of the price change in ticks if there is a change at the ith trade and Si = 0 if

there is no price change at the ith trade. When there is a price change, Si is a positive

integer-valued random variable.

Note that Di is not needed when Ai = 0, and there is a natural ordering in the

decomposition. Di is well deļ¬ned only when Ai = 1 and Si is meaningful when

191

MODELS FOR PRICE CHANGES

Ai = 1 and Di is given. Model speciļ¬cation under the decomposition makes use of

the ordering.

Let Fi be the information set available at the ith transaction. Examples of elements

in Fi are tiā’ j , Aiā’ j , Diā’ j , and Siā’ j for j ā„ 0. The evolution of price change under

model (5.21) can then be partitioned as

P(yi | Fiā’1 ) = P(Ai Di Si | Fiā’1 )

= P(Si | Di , Ai , Fiā’1 )P(Di | Ai , Fiā’1 )P(Ai | Fiā’1 ). (5.24)

Since Ai is a binary variable, it sufļ¬ces to consider the evolution of the probability

pi = P(Ai = 1) over time. We assume that

exi Ī²

pi

= xi Ī² pi = ,

ln or (5.25)

1 ā’ pi 1 + exi Ī²

where xi is a ļ¬nite-dimensional vector consisting of elements of Fiā’1 and Ī² is a

parameter vector. Conditioned on Ai = 1, Di is also a binary variable, and we use

the following model for Ī“i = P(Di = 1 | Ai = 1),

Ī“i ezi Ī³

= zi Ī³ or Ī“i = ,

ln (5.26)

1 ā’ Ī“i 1 + ezi Ī³

where zi is a ļ¬nite-dimensional vector consisting of elements of Fiā’1 and Ī³ is

a parameter vector. To allow for asymmetry between positive and negative price

changes, we assume that

g(Ī»u,i ) if Di = 1, Ai = 1

Si | (Di , Ai = 1) ā¼ 1 + (5.27)

g(Ī»d,i ) if Di = ā’1, Ai = 1,

where g(Ī») is a geometric distribution with parameter Ī» and the parameters Ī» j,i

evolve over time as

Ī» j,i ewi Īø j

= wi Īø j or Ī» j,i = , j = u, d,

ln (5.28)

1 ā’ Ī» j,i 1 + ewi Īø j

where wi is again a ļ¬nite-dimensional explanatory variables in Fiā’1 and Īø j is a

parameter vector.

In Eq. (5.27), the probability mass function of a random variable x, which follows

the geometric distribution g(Ī»), is

p(x = m) = Ī»(1 ā’ Ī»)m , m = 0, 1, 2, . . . .

We added 1 to the geometric distribution so that the price change, if it occurs, is at

least 1 tick. In Eq. (5.28), we take the logistic transformation to ensure that Ī» j,i ā

[0, 1].

192 HIGH-FREQUENCY DATA

The previous speciļ¬cation classiļ¬es the ith trade, or transaction, into one of three

categories:

1. no price change: Ai = 0 and the associated probability is (1 ā’ pi );

2. a price increase: Ai = 1, Di = 1, and the associated probability is pi Ī“i . The

size of the price increase is governed by 1 + g(Ī»u,i ).

3. a price drop: Ai = 1, Di = ā’1, and the associated probability is pi (1 ā’ Ī“i ).

The size of the price drop is governed by 1 + g(Ī»d,i ).

Let Ii ( j) for j = 1, 2, 3 be the indicator variables of the prior three categories. That

is, Ii ( j) = 1 if the jth category occurs and Ii ( j) = 0 otherwise. The log likelihood

function of Eq. (5.24) becomes

ln[P(yi | Fiā’1 )] = Ii (1) ln[(1 ā’ pi )] + Ii (2)[ln( pi ) + ln(Ī“i )

+ ln(Ī»u,i ) + (Si ā’ 1) ln(1 ā’ Ī»u,i )]

+ Ii (3)[ln( pi ) + ln(1 ā’ Ī“i ) + ln(Ī»d,i ) + (Si ā’ 1) ln(1 ā’ Ī»d,i )],

and the overall log likelihood function is

n

ln[P(y1 , . . . , yn | F0 )] = ln P(yi | Fiā’1 )], (5.29)

i=1

which is a function of parameters Ī², Ī³, Īøu , and Īød .

Example 5.2. We illustrate the decomposition model by analyzing the intra-

day transactions of IBM stock from November 1, 1990 to January 31, 1991. There

were 63 trading days and 59,838 intraday transactions in the normal trading hours.

The explanatory variables used are

1. Aiā’1 : The action indicator of the previous trade (i.e., the [i ā’ 1]th trade within

a trading day).

2. Diā’1 : The direction indicator of the previous trade.

3. Siā’1 : The size of the previous trade.

4. Viā’1 : The volume of the previous trade, divided by 1000.

5. tiā’1 : Time duration from the (i ā’ 2)th to (i ā’ 1)th trade.

6. B Ai : The bid-ask spread prevailing at the time of transaction.

Because we use lag-1 explanatory variables, the actual sample size is 59,775. It turns

out that Viā’1 , tiā’1 and B Ai are not statistically signiļ¬cant for the model enter-

tained. Thus, only the ļ¬rst three explanatory variables are used. The model employed

is

193

MODELS FOR PRICE CHANGES

pi

= Ī²0 + Ī²1 Aiā’1

ln

1 ā’ pi

Ī“i

= Ī³0 + Ī³1 Diā’1

ln (5.30)

1 ā’ Ī“i

Ī»u,i

= Īøu,0 + Īøu,1 Siā’1

ln

1 ā’ Ī»u,i

Ī»d,i

= Īød,0 + Īød,1 Siā’1 .

ln

1 ā’ Ī»d,i

The parameter estimates, using the log-likelihood function in Eq. (5.29), are given

in Table 5.5. The estimated simple model shows some dynamic dependence in the

price change. In particular, the trade-by-trade price changes of IBM stock exhibit

some appealing features:

1. The probability of a price change depends on the previous price change.

Speciļ¬cally, we have

P(Ai = 1 | Aiā’1 = 0) = 0.258, P(Ai = 1 | Aiā’1 = 1) = 0.476.

The result indicates that a price change may occur in clusters and, as expected,

most transactions are without price change. When no price change occurred

at the (i ā’ 1)th trade, then only about one out of four trades in the subse-

quent transaction has a price change. When there is a price change at the

(i ā’ 1)th transaction, the probability of a price change in the ith trade increases

to about 0.5.

2. The direction of price change is governed by

ļ£±

if Diā’1 = 0 (i.e., Aiā’1 = 0)

ļ£²0.483

if Diā’1 = 1, Ai = 1

P(Di = 1 | Fiā’1 , Ai ) = 0.085

ļ£³

if Diā’1 = ā’1, Ai = 1.

0.904

This result says that (a) if no price change occurred at the (i ā’ 1)th trade, then

the chances for a price increase or decrease at the ith trade are about even; and

(b) the probabilities of consecutive price increases or decreases are very low.

The probability of a price increase at the ith trade given that a price change

Table 5.5. Parameter Estimates of the ADS Model in Eq. (5.30) for IBM Intraday Trans-

actions: 11/01/90 to 1/31/91.

Ī²0 Ī²1 Ī³0 Ī³1 Īøu,0 Īøu,1 Īød,0 Īød,1

Parameter

ā’1.057 ā’0.067 ā’2.307 ā’0.670 ā’0.509

Estimate 0.962 2.235 2.085

Std.Err. 0.104 0.044 0.023 0.056 0.029 0.050 0.187 0.139

194 HIGH-FREQUENCY DATA

occurs at the ith trade and there was a price increase at the (i ā’ 1)th trade is

only 8.6%. However, the probability of a price increase is about 90% given

that a price change occurs at the ith trade and there was a price decrease at the

(i ā’ 1)th trade. Consequently, this result shows the effect of bid-ask bounce

and supports price reversals in high-frequency trading.

3. There is weak evidence suggesting that big price changes have a higher prob-

ability to be followed by another big price change. Consider the size of a price

increase. We have

Si | (Di = 1) ā¼ 1 + g(Ī»u,i ), Ī»u,i = 2.235 ā’ 0.670Siā’1 .

Using the probability mass function of a geometric distribution, we obtain that

the probability of a price increase by one tick is 0.827 at the ith trade if the

transaction results in a price increase and Siā’1 = 1. The probability reduces to

0.709 if Siā’1 = 2 and to 0.556 if Siā’1 = 3. Consequently, the probability of

a large Si is proportional to Siā’1 given that there is a price increase at the ith

trade.

A difference between the ADS and ordered probit models is that the ADS model

does not require any truncation or grouping in the size of a price change.

5.5 DURATION MODELS

Duration models are concerned with time intervals between trades. Longer dura-

tions indicate lack of trading activities, which in turn signify a period of no new

information. The dynamic behavior of durations, thus, contains useful information

about intraday market activities. Using concepts similar to the ARCH models for

volatility, Engle and Russell (1998) propose an autoregressive conditional duration

(ACD) model to describe the evolution of time durations for (heavily traded) stocks.

Zhang, Russell, and Tsay (2001) extend the ACD model to account for nonlinearity

and structural breaks in the data. In this section, we introduce some simple duration

models. As mentioned before, intraday transactions exhibit some diurnal pattern.

Therefore, we focus on the adjusted time duration

tiā— = ti / f (ti ), (5.31)

where f (ti ) is a deterministic function consisting of the cyclical component of ti .

Obviously, f (ti ) depends on the underlying asset and the systematic behavior of the

market. In practice, there are many ways to estimate f (ti ), but no single method

dominates the others in terms of statistical properties. A common approach is to use

smoothing spline. Here we use simple quadratic functions and indicator variables to

take care of the deterministic component of daily trading activities.

195

DURATION MODELS

For the IBM data employed in the illustration of ADS models, we assume

7

f (ti ) = exp[d(ti )], d(ti ) = Ī²0 + Ī² j f j (ti ), (5.32)

j=1

where

ļ£± 2

ti ā’ 38700

ļ£²

2

ti ā’ 43200 ā’ if ti < 43200

f 1 (ti ) = ā’ , f 3 (ti ) = 7500

ļ£³

14400

0 otherwise,

ļ£± 2

ļ£²ā’ ti ā’ 48600

ļ£“

if ti ā„ 43200

2

ti ā’ 48300

f 2 (ti ) = ā’ , f 4 (ti ) = 9000

ļ£“

ļ£³

9300

0 otherwise,

f 5 (ti ) and f 6 (ti ) are indicator variables for the ļ¬rst and second 5 minutes of market

opening [i.e., f 5 (.) = 1 if and only if ti is between 9:30 am and 9:35 am Eastern

(a) (c)

-0.2

-0.2

-0.6

-0.6

-1.0

-1.0

0 100 200 300 400 0 100 200 300 400

minutes minutes

(b) (d)

0.0

-0.2

-1.0

-0.6

-2.0

-1.0

0 100 200 300 400 0 100 200 300 400

minutes minutes

Figure 5.5. Quadratic functions used to remove the deterministic component of IBM intraday

trading durations: (a)ā“(d) are the functions f 1 (.) to f 4 (.) of Eq. (5.32), respectively.

196 HIGH-FREQUENCY DATA

Time], and f 7 (ti ) is the indicator for the last 30 minutes of daily trading [i.e., f 7 (ti ) =

1 if and only if the trade occurred between 3:30 pm and 4:00 pm Eastern Time].

Figure 5.5 shows the plot of fi (.) for i = 1, . . . , 4, where the time scales in the

x-axis is in minutes. Note that f 3 (43,200) = f 4 (43,200), where 43,200 corresponds

to 12:00 noon.

The coefļ¬cients Ī² j of Eq. (5.32) are obtained by the least squares method of the

linear regression

7

ln( ti ) = Ī²0 + Ī² j f j (ti ) + i .

j=1

The ļ¬tted model is

ln( ti ) = 2.555 + 0.159 f 1 (ti ) + 0.270 f 2 (ti ) + 0.384 f 3 (ti )

+ 0.061 f 4 (ti ) ā’ 0.611 f 5 (ti ) ā’ 0.157 f 6 (ti ) + 0.073 f 7 (ti ).

Figure 5.6 shows the time plot of average durations in 5-minute time intervals over

the 63 trading days before and after adjusting for the deterministic component. Part

(a) (b)

3.2

40

3.0

30

2.8

ave-dur

ave-dur

2.6

20

2.4

2.2

10

2.0

0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80

5-minute intervals 5-minute intervals

Figure 5.6. IBM transactions data from 11/01/90 to 1/31/91: (a) The average durations in 5-

minute time intervals, and (b) the average durations in 5-minute time intervals after adjusting

for the deterministic component.

197

DURATION MODELS

(a) is the average durations of ti and, as expected, it exhibits a diurnal pattern. Part

tiā— (i.e., after the adjustment), and the diurnal pattern

(b) is the average durations of

is largely removed.

5.5.1 The ACD Model

The autoregressive conditional duration (ACD) model uses the idea of GARCH mod-

els to study the dynamic structure of the adjusted duration tiā— of Eq. (5.31). For ease

in notation, we deļ¬ne xi = tiā— .

Let Ļi = E(xi | Fiā’1 ) be the conditional expectation of the adjusted duration

between the (i ā’ 1)th and ith trades, where Fiā’1 is the information set available at

the (i ā’ 1)th trade. In other words, Ļi is the expected adjusted duration given Fiā’1 .

The basic ACD model is deļ¬ned as

xi = Ļi i , (5.33)

where { i } is a sequence of independent and identically distributed non-negative ran-

dom variables such that E( i ) = 1. In Engle and Russell (1998), i follows a standard

exponential or a standardized Weibull distribution, and Ļi assumes the form

r s

Ļi = Ļ + Ī³ j xiā’ j + Ļ j Ļiā’ j . (5.34)

j=1 j=1

Such a model is referred to as an ACD(r, s) model. When the distribution of i is

exponential, the resulting model is called an EACD(r, s) model. Similarly, if i fol-

lows a Weibull distribution, the model is a WACD(r, s) model. If necessary, readers

are referred to Appendix A for a quick review of exponential and Weibull distribu-

tions.

Similar to GARCH models, the process Ī·i = xi ā’ Ļi is a Martingale difference

sequence [i.e., E(Ī·i | Fiā’1 ) = 0], and the ACD(r, s) model can be written as

max(r,s) s

xi = Ļ + (Ī³ j + Ļ j )xiā’ j ā’ Ļ j Ī·iā’ j + Ī· j , (5.35)

j=1 j=1

which is in the form of an ARMA process with non-Gaussian innovations. It is under-

stood here that Ī³ j = 0 for j > r and Ļ j = 0 for j > s. Such a representation can

be used to obtain the basic conditions for weak stationarity of the ACD model. For

instance, taking expectation on both sides of Eq. (5.35) and assuming weak station-

arity, we have

Ļ

E(xi ) = .

max(r,s)

1ā’ (Ī³ j + Ļj)

j=1

198 HIGH-FREQUENCY DATA

Therefore, we assume Ļ > 0 and 1 > j (Ī³ j + Ļ j ) because the expected duration is

positive. As another application of Eq. (5.35), we study properties of the EACD(1, 1)

model.

EACD(1, 1) Model

An EACD(1, 1) model can be written as

xi = Ļi i , Ļi = Ļ + Ī³1 xiā’1 + Ļ1 Ļiā’1 , (5.36)

where i follows the standard exponential distribution. Using the moments of a stan-

dard exponential distribution in Appendix A, we have E( i ) = 1, Var( i ) = 1, and

E( i2 ) = Var(xi ) + [E(xi )]2 = 2. Assuming that xi is weakly stationary (i.e., the

ļ¬rst two moments of xi are time-invariant), we derive the variance of xi . First, taking

expectation of Eq. (5.36), we have

E(xi ) = E[E(Ļi | Fiā’1 )] = E(Ļi ), E(Ļi ) = Ļ + Ī³1 E(xiā’1 ) + Ļ1 E(Ļiā’1 ).

i

(5.37)

Under weak stationarity, E(Ļi ) = E(Ļiā’1 ) so that Eq. (5.37) gives

Ļ

Āµx ā” E(xi ) = E(Ļi ) = . (5.38)

1 ā’ Ī³1 ā’ Ļ1

Next, because E( i2 ) = 2, we have E(xi2 ) = E[E(Ļi2 i2 | Fiā’1 )] = 2E(Ļi2 ).

Taking square of Ļi in Eq. (5.36) and expectation and using weak stationarity of

Ļi and xi , we have, after some algebra, that

1 ā’ (Ī³1 + Ļ1 )2

E(Ļi2 ) = Āµ2 Ć— . (5.39)

x

1 ā’ 2Ī³1 ā’ Ļ1 ā’ 2Ī³1 Ļ1

2 2

Finally, using Var(xi ) = E(xi2 ) ā’ [E(xi )]2 and E(xi2 ) = 2E(Ļi2 ), we have

1 ā’ Ļ1 ā’ 2Ī³1 Ļ1

2

Var(xi ) = 2E(Ļi2 ) ā’ Āµ2 = Āµ2 Ć— ,

x x

1 ā’ Ļ1 ā’ 2Ī³1 Ļ1 ā’ 2Ī³1

2 2

where Āµx is deļ¬ned in Eq. (5.38). This result shows that, to have time-invariant

unconditional variance, the EACD(1, 1) model in Eq. (5.36) must satisfy 1 > 2Ī³1 +

2

Ļ1 + 2Ī³1 Ļ1 . The variance of an WACD(1, 1) model can be obtained by using the

2

same techniques and the ļ¬rst two moments of a standardized Weibull distribution.

ACD Models with a Generalized Gamma Distribution

In the statistical literature, intensity function is often expressed in terms of hazard

function. As shown in Appendix B, the hazard function of an EACD model is con-

stant over time and that of an WACD model is a monotonous function. These hazard

functions are rather restrictive in application as the intensity function of stock trans-

199

DURATION MODELS

actions might not be constant or monotone over time. To increase the ļ¬‚exibility of the

associated hazard function, Zhang, Russell, and Tsay (2001) employ a (standardized)

generalized Gamma distribution for i . See Appendix A for some basic properties of

a generalized Gamma distribution. The resulting hazard function may assume vari-

ous patterns, including U shape or inverted U shape. We refer to an ACD model with

innovations that follow a generalized Gamma distribution as a GACD(r, s) model.

5.5.2 Simulation

To illustrate ACD processes, we generated 500 observations from the ACD(1, 1)

model

xi = Ļi i , Ļi = 0.3 + 0.2xiā’1 + 0.7Ļiā’1 (5.40)

using two different innovational distributions for i . In case 1, i is assumed to follow

a standardized Weibull distribution with parameter Ī± = 1.5. In case 2, i follows a

(standardized) generalized Gamma distribution with parameters Īŗ = 1.5 and Ī± =

0.5.

Figure 5.7(a) shows the time plot of the WACD(1, 1) series, whereas Figure 5.8(a)

is the GACD(1, 1) series. Figure 5.9 plots the histograms of both simulated series.

(a) A simulated WACD(1,1) series

8 10

46

dur

2

0

0 100 200 300 400 500

(b) Standardized series

3

std-dur

2

1 0

0 100 200 300 400 500

Figure 5.7. A simulated WACD(1, 1) series in Eq. (5.40): (a) the original series, and (b) the

standardized series after estimation. There are 500 observations.

(a) A simulated GACD(1,1) series

80

60

dur

40 20

0

0 100 200 300 400 500

(b) Standardized series

5 10 15 20 25

std-dur

0

0 100 200 300 400 500

Figure 5.8. A simulated GACD(1, 1) series in Eq. (5.40): (a) the original series, and (b) the

standardized series after estimation. There are 500 observations.

(b) GACD(1,1)

(a) WACD(1,1)

120

300

100

80

200

60

40

100

20

0

0

0 2 4 6 8 10 0 20 40 60 80

z x

Figure 5.9. Histograms of simulated duration processes with 500 observations:

(a) WACD(1, 1) model, and (b) GACD(1, 1) model

200

201

DURATION MODELS

Series : x

0.0 0.2 0.4 0.6 0.8 1.0

ACF

0 5 10 15 20 25 30

Lag

Series : y

0.0 0.2 0.4 0.6 0.8 1.0

ACF

0 5 10 15 20 25 30

Lag

Figure 5.10. The sample autocorrelation function of a simulated WACD(1, 1) series with 500

observations: (a) the original series, and (b) the standardized residual series.

The difference between the two models is evident. Finally, the sample ACF of the

two simulated series are shown in Figure 5.10(a) and Figure 5.11(b), respectively.

The serial dependence of the data is clearly seen.

5.5.3 Estimation

For an ACD(r, s) model, let i o = max(r, s) and xt = (x1 , . . . , xt ) . The likelihood

function of the durations x1 , . . . , x T is

T

f (xT | Īø) = f (xi | Fiā’1 , Īø) Ć— f (xio | Īø),

i=i o +1

where Īø denotes the vector of model parameters, and T is the sample size. The

marginal probability density function f (xio | Īø) of the previous equation is rather

complicated for a general ACD model. Because its impact on the likelihood function

is diminishing as the sample size T increases, this marginal density is often ignored,

resulting in the use of conditional likelihood method. For a WACD model, we use

the probability density function (pdf) of Eq. (5.55) and obtain the conditional log

202 HIGH-FREQUENCY DATA

Series : x

0.8

ACF

0.4

0.0

0 5 10 15 20 25

Lag

Series : y

0.8

ACF

0.4

0.0

0 5 10 15 20 25

Lag

Figure 5.11. The sample autocorrelation function of a simulated GACD(1, 1) series with 500

observations: (a) the original series, and (b) the standardized residual series.

likelihood function

T

Ī±

1

(x | Īø, xio ) = Ī± ln 1+ + ln

Ī± xi

i=i 0 +1

ļ£® ļ£¹Ī±

1+ 1

xi

Ī±

xi

ā’ļ£° ļ£»,

+ Ī± ln (5.41)

Ļi Ļi

where Ļi = Ļ + rj=1 Ī³ j xiā’ j + sj=1 Ļ j Ļiā’ j , Īø = (Ļ, Ī³1 , . . . , Ī³r , Ļ1 , . . . , Ļs , Ī±)

and x = (xio +1 , . . . , x T ) . When Ī± = 1, the (conditional) log likelihood function

reduces to that of an EACD(r, s) model.

For a GACD(r, s) model, the conditional log likelihood function is

Ī±

T

Ī± xi

(x | Īø, xio ) = +(ĪŗĪ±ā’1) ln(xi )ā’ĪŗĪ± ln(Ī»Ļi )ā’ , (5.42)

ln

(Īŗ) Ī»Ļi

i=i o +1

where Ī» = (Īŗ)/ (Īŗ + Ī± ) and the parameter vector Īø now also includes Īŗ. As

1

expected, when Īŗ = 1, Ī» = 1/ (1 + Ī± ) and the log likelihood function in Eq. (5.42)

1

203

DURATION MODELS

reduces to that of a WACD(r, s) model in Eq. (5.41). This log likelihood function

can be rewritten in many ways to simplify the estimation.

Under some regularity conditions, the conditional maximum likelihood estimates

are asymptotically normal; see Engle and Russell (1998) and the references therein.

In practice, simulation can be used to obtain ļ¬nite-sample reference distributions for

the problem of interest once a duration model is speciļ¬ed.

Example 5.3. (Simulated ACD(1,1) series continued) Consider the simulated

WACD(1,1) and GACD(1, 1) series of Eq. (5.40). We apply the conditional likeli-

hood method and obtain the results in Table 5.6. The estimates appear to be reason-

Ė Ė

able. Let Ļi be the 1-step ahead prediction of Ļi and Ėi = xi /Ļi be the standardized

series, which can be regarded as standardized residuals of the series. If the model

is adequately speciļ¬ed, {Ėi } should behave as a sequence of independent and iden-

tically distributed random variables. Figure 5.7(b) and Figure 5.8(b) show the time

plot of Ėi for both models. The sample ACF of Ėi for both ļ¬tted models are shown in

Figure 5.10(b) and Figure 5.11(b), respectively. It is evident that no signiļ¬cant serial

correlations are found in the Ėi series.

Example 5.4. As an illustration of duration models, we consider the trans-

action durations of IBM stock on ļ¬ve consecutive trading days from November 1 to

November 7, 1990. Focusing on positive transaction durations, we have 3534 obser-

vations. In addition, the data have been adjusted by removing the deterministic com-

ponent in Eq. (5.32). That is, we employ 3534 positive adjusted durations as deļ¬ned

in Eq. (5.31).

Figure 5.12(a) shows the time plot of the adjusted (positive) durations for the ļ¬rst

ļ¬ve trading days of November 1990, and Figure 5.13(a) gives the sample ACF of

the series. There exist some serial correlations in the adjusted durations. We ļ¬t a

WACD(1, 1) model to the data and obtain the model

xi = Ļi i , Ļi = 0.169 + 0.064xiā’1 + 0.885Ļiā’1 , (5.43)

Table 5.6. Estimation Results for Simulated ACD(1,1) Series with 500 Observations:

(a) for WACD(1,1) Series and (b) for GACD(1,1) Series.

(a) WACD(1,1) model

Ļ Ī³1 Ļ1 Ī±

Parameter

True 0.3 0.2 0.7 1.5

Estimate 0.364 0.100 0.767 1.477

Std Error (0.139) (0.025) (0.060) (0.052)

(b) GACD(1,1) model

Ļ Ī³1 Ļ1 Ī± Īŗ

Parameter

True 0.3 0.2 0.7 0.5 1.5

Estimate 0.401 0.343 0.561 0.436 2.077

Std Error (0.117) (0.074) (0.065) (0.078) (0.653)

204 HIGH-FREQUENCY DATA

(a) Adjusted durations

40

10 20 30

adj-dur

0

0 1000 2000 3000

sequence

0 2 4 6 8 10 12 14

norm-dur

0 1000 2000 3000

sequence

Figure 5.12. Time plots of durations for IBM stock traded in the ļ¬rst ļ¬ve trading days of

November 1990: (a) the adjusted series, and (b) the normalized innovations of an WACD(1, 1)

model. There are 3534 nonzero durations.

where { i } is a sequence of independent and identically distributed random variates

that follow the standardized Weibull distribution with parameter Ī± = 0.879(0.012),

Ė

where 0.012 is the estimated standard error. Standard errors of the estimates in

Eq. (5.43) are 0.039, 0.010, and 0.018, respectively. All t ratios of the estimates

are greater than 4.2, indicating that the estimates are signiļ¬cant at the 1% level.

Ė

Figure 5.12(b) shows the time plot of Ėi = xi /Ļi , and Figure 5.13(b) provides the

sample ACF of Ėi . The Ljungā“Box statistics show Q(10) = 4.96 and Q(20) = 10.75

for the Ėi series. Clearly, the standardized innovations have no signiļ¬cant serial cor-

relations. In fact, the sample autocorrelations of the squared series {Ėi2 } are also small

with Q(10) = 6.20 and Q(20) = 11.16, further conļ¬rming lack of serial dependence

in the normalized innovations. In addition, the mean and standard deviation of a stan-

dardized Weibull distribution with Ī± = 0.879 are 1.00 and 1.14, respectively. These

numbers are close to the sample mean and standard deviation of {Ėi }, which are 1.01

and 1.22, respectively. The ļ¬tted model seems adequate.

In model (5.43), the estimated coefļ¬cients show Ī³1 + Ļ1 ā 0.949, indicating

Ė Ė

certain persistence in the adjusted durations. The expected adjusted duration is

0.169/(1 ā’ 0.064 ā’ 0.885) = 3.31 seconds, which is close to the sample mean 3.29

of the adjusted durations. The estimated Ī± of the standardized Weibull distribution

205

DURATION MODELS

Series : x

0.0 0.2 0.4 0.6 0.8 1.0

ACF

0 10 20 30

Lag

Series : epsi

0.0 0.2 0.4 0.6 0.8 1.0

ACF

0 10 20 30

Lag

Figure 5.13. The sample autocorrelation function of adjusted durations for IBM stock traded

in the ļ¬rst ļ¬ve trading days of November 1990: (a) the adjusted series, and (b) the normalized

innovations for a WACD(1, 1) model.

is 0.879, which is less than but close to 1. Thus, the conditional hazard function is

monotonously decreasing at a slow rate.

If a generalized Gamma distribution function is used for the innovations, then the

ļ¬tted GACD(1, 1) model is

xi = Ļi i , Ļi = 0.141 + 0.063xiā’1 + 0.897Ļiā’1 , (5.44)

where { i } follows a standardized, generalized Gamma distribution in Eq. (5.56)

with parameters Īŗ = 4.248(1.046) and Ī± = 0.395(0.053), where the number in

parentheses denotes estimated standard error. Standard errors of the three parame-

ters in Eq. (5.44) are 0.041, 0.010, and 0.019, respectively. All of the estimates are

statistically signiļ¬cant at the 1% level. Again, the normalized innovational process

Ė

{Ėi } and its squared series have no signiļ¬cant serial correlation, where Ėi = xi /Ļi

based on model (5.44). Speciļ¬cally, for the Ėi process, we have Q(10) = 4.95 and

Q(20) = 10.28. For the Ėi2 series, we have Q(10) = 6.36 and Q(20) = 10.89.

The expected duration of model (5.44) is 3.52, which is slightly greater than that

of the WACD(1, 1) model in Eq. (5.43). Similarly, the persistence parameter Ī³1 + Ļ1

Ė Ė

of model (5.44) is also slightly higher at 0.96.

206 HIGH-FREQUENCY DATA

Remark: Estimation of EACD models can be carried out by using programs for

ARCH models with some minor modiļ¬cation; see Engle and Russell (1998). In this

book, we use either the RATS program or some Fortran programs developed by the

author to estimate the duration models. Limited experience indicates that it is harder

to estimate a GACD model than an EACD or a WACD model. RATS programs used

to estimate WACD and GACD models are given in Appendix C.

5.6 NONLINEAR DURATION MODELS

Nonlinear features are also commonly found in high-frequency data. As an illus-

tration, we apply some nonlinearity tests discussed in Chapter 4 to the normal-

ized innovations Ėi of the WACD(1, 1) model for the IBM transaction durations in

Example 5.4; see Eq. (5.43). Based on an AR(4) model, the test results are given in

part (a) of Table 5.7. As expected from the model diagnostics of Example 5.4, the

Ori-F test indicates no quadratic nonlinearity in the normalized innovations. How-

ever, the TAR-F test statistics suggest strong nonlinearity.

Based on the test results in Table 5.7, we entertain a threshold duration model

with two regimes for the IBM intraday durations. The threshold variable is xtā’1 (i.e.,

lag-1 adjusted duration). The estimated threshold value is 3.79. The ļ¬tted threshold

WACD(1, 1) model is xi = Ļi i , where

0.020 + 0.257xiā’1 + 0.847Ļiā’1 , ā¼ w(0.901) if xiā’1 ā¤ 3.79

i

Ļi =

1.808 + 0.027xiā’1 + 0.501Ļiā’1 , ā¼ w(0.845) if xiā’1 > 3.79,

i

(5.45)

Table 5.7. Nonlinearity Tests for IBM Transaction Durations from November 1 to

November 7, 1990. Only Intraday Durations Are Used. The Number in the Parenthe-

ses of Tar-F Tests Denotes Time Delay.

(a) Normalized innovations of a WACD(1,1) model

Type Ori-F Tar-F(1) Tar-F(2) Tar-F(3) Tar-F(4)

Test 0.343 3.288 3.142 3.128 0.297

p value 0.969 0.006 0.008 0.008 0.915

(b) Normalized innovations of a threshold WACD(1,1) model

Type Ori-F Tar-F(1) Tar-F(2) Tar-F(3) Tar-F(4)

Test 0.163 0.746 1.899 1.752 0.270

p value 0.998 0.589 0.091 0.119 0.929

207

THE PCD MODEL

where w(Ī±) denotes a standardized Weibull distribution with parameter Ī±. The num-

ber of observations in the two regimes are 2503 and 1030, respectively. In Eq. (5.45),

the standard errors of the parameters for the ļ¬rst regime are 0.043, 0.041, 0.024,

and 0.014, whereas those for the second regime are 0.526, 0.020, 0.147, and 0.020,

respectively.

Ė

Consider the normalized innovations Ėi = xi /Ļi of the threshold WACD(1, 1)

model in Eq. (5.45). We obtain Q(12) = 9.8 and Q(24) = 23.9 for Ėi and Q(12) =

8.0 and Q(24) = 16.7 for Ėi2 . Thus, there are no signiļ¬cant serial correlations in the

Ėi and Ėi2 series. Furthermore, applying the same nonlinearity tests as before to this

newly normalized innovational series Ėi , we detect no nonlinearity; see part (b) of

Table 5.7. Consequently, the two-regime threshold WACD(1, 1) model in Eq. (5.45)

is adequate.

If we classify the two regimes as heavy and thin trading periods, then the threshold

model suggests that the trading dynamics measured by intraday transaction durations

are different between heavy and thin trading periods for IBM stock even after the

adjustment of diurnal pattern. This is not surprising as market activities are often

driven by arrivals of news and other information.

The estimated threshold WACD(1, 1) model in Eq. (5.45) contains some insignif-

icant parameters. We reļ¬ne the model and obtain the result:

0.225xiā’1 + 0.867Ļiā’1 , ā¼ w(0.902) if xiā’1 ā¤ 3.79

i

Ļi =

1.618 + 0.614Ļiā’1 , ā¼ w(0.846) if xiā’1 > 3.79.

i

All of the estimates of the reļ¬ned model are highly signiļ¬cant. The Ljungā“Box

Ė

statistics of the standardized innovations Ėi = xi /Ļi show Q(10) = 5.91(0.82)

and Q(20) = 16.04(0.71) and those of Ėi 2 give Q(10) = 5.35(0.87) and Q(20) =

15.20(0.76), where the number in parentheses is the p value. Therefore, the reļ¬ned

model is adequate. The RATS program used to estimate the prior model is given in

Appendix C.

5.7 BIVARIATE MODELS FOR PRICE CHANGE AND DURATION

In this section, we introduce a model that considers jointly the process of price

change and the associated duration. As mentioned before, many intraday transactions

of a stock result in no price change. Those transactions are highly relevant to trading

intensity, but they do not contain direct information on price movement. Therefore,

to simplify the complexity involved in modeling price change, we focus on transac-

tions that result in a price change and consider a price change and duration (PCD)

model to describe the multivariate dynamics of price change and the associated time

duration.

We continue to use the same notation as before, but the deļ¬nition is changed to

transactions with a price change. Let ti be the calendar time of the ith price change

of an asset. As before, ti is measured in seconds from midnight of a trading day. Let

Pti be the transaction price when the ith price change occurred and ti = ti ā’tiā’1 be

208 HIGH-FREQUENCY DATA

the time duration between price changes. In addition, let Ni be the number of trades

in the time interval (tiā’1 , ti ) that result in no price change. This new variable is used

to represent trading intensity during a period of no price change. Finally, let Di be

the direction of the ith price change with Di = 1 when price goes up and Di = ā’1

when the price comes down, and let Si be the size of the ith price change measured

in ticks. Under the new deļ¬nitions, the price of a stock evolves over time by

Pti = Ptiā’1 + Di Si , (5.46)

ńņš. 9 |