price-dividend ratio is stationary, or even bounded. For these terms not to be zero, the price

dividend ratio must be expected to grow explosively, and faster than R or ρ’1 . Especially

in the linearized form 20.310 you can see that stationary r, ∆d and implies stationary p ’

d if the last term is zero, and p ’ d is not stationary if the last term is not zero. Thus,

you might want to rule out these terms just based on the view that price dividend ratios

do not and are not expected to explode in this way. You can also invoke economic theory

to rule them out. The last terms must be zero in an equilibrium of in¬nitely lived agents

or altruistically linked generations. If wealth explodes, optimizing long-lived agents will

consume more. Technically, this limiting condition is a ¬rst order condition for optimality

just like the period to period ¬rst order condition. The presence of the last term also presents

an arbitrage opportunity in complete markets, as you can short a security whose price contains

the last term, buy the dividends separately and eat the difference right away.

On the other hand, there are economic theories that permit the limiting terms “ overlap-

ping generations models, and they capture the interesting possibility of “rational bubbles”

that many observers think they see in markets, and that have sparked a huge literature and a

lot of controversy.

An investor holds a security with a rational bubble not for any dividends, but on the

expectation that someone else will pay even more for that security in the future. This does

seem to capture the psychology of investors from the tulip bubble of 17th century Holland

to the dot-com bubble of the millenial United States “ why else would anyone buy Cisco

systems at a price-earnings ratio of 217 and market capitalization 10 times that of General

Motors in early 2000?

A “rational bubble” imposes a little discipline on this centuries old psychological descrip-

tion, however, by insisting that the person who is expected to buy the security in the future

also makes the same calculation. He must expect the price to rise even further. Continuing re-

cursively, the price of a rational bubble must be expected to rise forever. A Ponzi scheme, in

which everyone knows the game will end at some time, cannot rationally get off the ground.

The expectation that prices will grow at more than a required rate of return forever does not

365

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

mean that sample paths do so. For example, consider the bubble process

( )

γRPt prob = γPt R’1

Pt R’1

Pt+1 = .

prob = γPtt R

P

1 R’1

Figure 38 plots a realization of this process with γ = 1.2. This process yields an expected

return R, and the dashed line graphs this expectation as of the ¬rst date. Its price is positive

though it never pays dividends. It repeatedly grows with a high return γR for a while and

then bursts back to one. The expected price always grows, though almost all sample paths do

not do so.

Figure 38. Sample path from a simple bubble process. The solid line gives the bubble. The

dashed line gives the expected value of the bubble as of time zero, i.e. pRt .

0

In¬nity is a long time. It™s really hard to believe that prices will rise forever. The solar

system will end at some point; any look at the geological and evolutionary history of the earth

suggests that our species will be around a lot less than that. Thus, the in¬nity in the bubble

must really be a parable for “a really long time.” But then the “rational” part of the bubble

pops “ it must hinge on the expectation that someone will be around to hold the bag; to buy

a security without the expectation of dividends or further price increases. (The forever part

of usual present value formulas is not similarly worrying because 99.99% of the value comes

from the ¬rst few hundred years of dividends.)

366

SECTION 20.1 TIME-SERIES PREDICTABILITY

Empirically, bubbles do not appear to be the reason for historical price-dividend ratio

variation. First, price-dividend ratios do seem stationary. (Craine 1993 runs a unit root test

with this conclusion.) Even if statistical tests are not decisive, as is expected for a slow

moving series, or a series such as that plotted in Figure 38, it is hard to believe that price-

dividend ratios can explode rather than revert back to their four-century average level of

about 20 to 25. Second, Table 2 shows that return and dividend forecastability terms add up

to 100% of the variance of price-dividend ratios. In a bubble, we would expect price variation

not matched by any variation in expected returns or dividends, as is the case in Figure 38.

I close with a warning: The word “bubble” is widely used to mean very different things.

Some people seem to mean any large movement in prices. Others mean large movements in

prices that do correspond to low or perhaps negative expected excess returns (I think this is

what Shiller 2000 has in mind), i.e. any price movement not explained by a present value

model with constant expected returns.

20.1.3 A simple model for digesting predictability

To unite the various predictability and return observations, I construct a simple VAR

representation for returns, price growth, dividend growth, dividend price ratio. I start only

with a slow moving expected return and unforecastable dividends.

This speci¬cation implies that d/p ratios reveal expected returns.

This speci¬cation implies return forecastability. To believe in a lower predictability of

returns, you must either believe that dividend growth really is predictable, or that the d/p

ratio is really much more persistent than it appears to be.

This speci¬cation shows that small but persistent changes in expected returns add up to

large price changes.

We have isolated two important features of the long-horizon forecast phenomenon: div-

idend/price ratios are highly persistent, and dividend growth is essentially unforecastable.

Starting with these two facts, a simple VAR representation can tie together many the pre-

dictability and volatility phenomena.

Start by specifying a slow-moving state variable xt that drives expected returns, and un-

forecastable dividend growth,

(20.311)

xt = bxt’1 + δ t

(20.312)

rt+1 = xt + µrt+1

(20.313)

∆dt+1 = µdt+1

All variables are demeaned logs. (The term structure models of Chapter 19 were of this

367

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

form.)

From this speci¬cation, using the linearized present value identity and return, we can

derive a VAR representation for prices, returns, dividends, and the dividend price ratio,

δ t+1

(20.314)

(dt+1 ’ pt+1 ) = b(dt ’ pt ) +

1 ’ ρb

µ ¶

ρ

(20.315)

rt+1 = (1 ’ ρb) (dt ’ pt ) + µdt+1 ’ δ t+1

1 ’ ρb

µ ¶

1

(20.316)

∆pt+1 = (1 ’ b) (dt ’ pt ) + µdt+1 ’ δ t+1

1 ’ ρb

(20.317)

∆dt+1 = µdt+1

Dividend-price ratio: Using the approximate present value identity, we can ¬nd the divi-

dend price ratio

∞

X xt

ρj’1 (Et rt+1 ’ Et dt+j ) = (318)

dt ’ pt = Et .

1 ’ ρb

j=1

This equation makes precise my comments that the dividend price ratio reveals expected

returns. Obviously, the feature that the dividend price ratio is exactly proportional to the ex-

pected return does not generalize. If dividend growth is also forecastable, then the dividend-

price ratio is a combination of dividend growth and return forecasts. Actual return forecasting

exercises can often bene¬t from cleaning up the dividend price ratio to focus on the implied

return forecast.

Returns: Since we know where the dividend/price ratio and dividends are going, we can

¬gure out where returns are going. Use the return linearization (this is equivalent to (20.304))

µ ¶

Pt+1 Dt+1 Pt

Rt+1 = 1+ /

Dt+1 Dt Dt

(20.319)

rt+1 = ρ(pt+1 ’ dt+1 ) + (dt+1 ’ dt ) ’ (pt ’ dt ).

Now, plug in the from (20.314) and (20.313) to get (20.315).

Prices: Write

(320)

pt+1 ’ pt = ’(dt+1 ’ pt+1 ) + (dt ’ pt ) + (dt+1 ’ dt ).

Then, plugging in from (20.314) and (20.313), we get (20.316).

We can back out parameters from the reduced form return - d/p VAR. (Any two equations

carry all the information of this system.) Table RR presents some estimates.

368

SECTION 20.1 TIME-SERIES PREDICTABILITY

Sample a a, D/P b σ(µr ) σ(µdp ) ρ(µr , µdp )

27-98 0.16 4.7 0.92 19.2 15.2 -0.72

48-98 0.14 4.0 0.97 15.0 12.6 -0.71

27-92 0.28 6.7 0.82 19.0 15.0 -0.69

48-92 0.27 6.2 0.87 14.5 12.4 -0.67

Table RR. Estimates of log excess return and log dividend-price ratio regressions,

using annual CRSP data. r is the difference between the log value weighted return

and the log treasury bill rate. The estimates are of the system

rt+1 = a(dt ’ pt ) + µrt+1

dt+1 ’ pt+1 = b(dt ’ pt ) + µdp,t+1

and

Dt

rt+1 = (a, D/P ) + µt+1

Pt

I report both the more intuitive coef¬cients on the actual d/p ratio and the coef¬cients on

the log d/p ratio, which is a more useful speci¬cation for our transformations. The two line

up; a coef¬cient of 5 on Dt /Pt implies a coef¬cient of 5—D/P ≈ 0.25 on (Dt /Pt ) /(D/P ).

You can see that the parameters depend substantially on the sample. In particular, the

dramatic returns of the late 1990s, despite low dividend yields, cut the postwar return forecast

coef¬cients in half and the overall sample estimate by about one third. That dramatic decline

in the d/p ratio also induces a very high apparent persistence in the d/p ratio, rising to a 0.97

estimate in the 48-98 sample. (Faced with an apparent trend in the data, an autoregression

estimates a root near unity.)

With these estimates in mind, given the considerations outlined below, I will make calcu-

lations using reduced form parameters

(20.321)

b = 0.9

ρ = 0.96

σ(µr ) = 15

σ(µdp ) = 12.5

ρr,dp = ’0.7

From these parameters, we can ¬nd the underlying parameters of (20.311)-(20.313). I com-

369

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

ment on each one below as it becomes useful.

(20.322)

σ(δ) = σ(µdp )(1 ’ ρb) = 1.7

q

σ(µd ) = σ(µr + ρµdp ) = σ2 (µr ) + ρ2 σ 2 (µdp ) + 2ρσ(µr , µdp ) = 10.82

σ(µd , µdp ) = σ(µr µdp ) + ρσ 2 (µdp )

ρ(µr µdp )σ(µr ) + ρσ(µdp )

(20.323)

ρ(µd , µdp ) = = 0.139

σ(µd )

The size of the return forecasting coef¬cient.

Does the magnitude of the estimated predictability make sense? Given the statistical un-

certainties, do other facts guide us to higher or lower predictability?

The coef¬cient of the one year excess return on the dividend price ratio in Table 1 is about

5, and the estimates in Table RR vary from 4 to 6 depending on the sample. These values are

surprisingly large. For example, a naive investor might think that dividend yields move one-

for-one with returns; if they pay more dividends, you get more money. Before predictability,

we would have explained that high dividend yield means that prices are low in anticipation

of lower future dividends, leaving the expected return unchanged. Now we recognize the

possibility of time-varying expected returns, but does it make sense that expected returns

move even more than dividend yields?

Return forecastability follows from the fact that dividends are not forecastable, and that

the dividend/price ratio is highly but not completely persistent. We see this in the calculated

coef¬cients of prices and returns on the dividend price ratio in (20.315) and (20.316). We

derived

rt+1 = (1 ’ ρb) (dt ’ pt ) + µrt+1

∆pt+1 = (1 ’ b) (dt ’ pt ) + µpt+1

Since dividends are not forecastable, it is no surprise that the formulas for price growth and

return are so similar. The return formula basically just adjusts for the fact that a higher

dividend yield directly contributes to return by paying more dividends. To transform units to

regressions on D/P, multiply by 25, e.g.

1 ’ ρb Dt

rt+1 = + µrt+1 .

D/P Pt

Suppose the d/p ratio were not persistent at all“b = 0. Then both return and price growth

coef¬cients should be 1 in logs or about 25 in levels! If the d/p ratio is one percentage point

above its average, we must forecast enough of a rise in prices to restore the d/p ratio to its av-

erage in one year. The average d/p ratio is about 4%, though, so prices and hence returns must

rise by 25% to change the d/p ratio by one percentage point. d(D/P ) = ’D/P d(P )/P .

Suppose instead that the d/p ratio were completely persistent i.e. a random walk with

370

SECTION 20.1 TIME-SERIES PREDICTABILITY

b = 1. Then the return coef¬cient is 1 ’ ρ = 0.04, and about 1.0 in levels, while the price

coef¬cient is 0. If the d/p ratio is one percent above average and expected to stay there, and

dividends are not forecastable, then prices must not be forecast to change either. The return

is one percentage point higher, because you get the higher dividends as well. Thus, the naive

investor who expects dividend yield to move one for one with returns not only implicitly

assumes that dividends are not forecastable “ which turns out to be true “ but also that the d/p

ratio will stay put forever.

A persistence parameter b = 0.90 implies price and return regression coef¬cients of

1 ’ b = 0.10

1 ’ ρb = 1 ’ 0.96 — 0.90 = 0.14

or about 2.5 and 3.4 in levels. If the dividend yield is one percentage point high, and is ex-

pected to be 0.9 percentage points high in one year, then prices must increase by P/D — 0.1

percentage points in the next year. The return gets the additional dividend. This, fundamen-

tally, is how zero forecastability of dividends implies that returns move more than one for

one with the dividend yield.

This is a little below the sample estimates in Table 1 and Table RR of 4-6. That is because

in the sample, a high price seems to forecast even lower dividend growth “ the wrong sign,

which is hard to believe. To continue with a calibration that consistently captures the facts

with no dividend forecastability, we either have to lower the persistence coef¬cient or lower

the return forecasting coef¬cient from the values reported in Table RR. A persistence b = 0.8

implies a return coef¬cient (1’ ρb) = (1’ 0.96— 0.8) = 0.23 or in levels 0.23 —25 = 5.75.

However, given the uncertainties of dividend/price forecastability, it seems more sensible to

continue calculations with b = 0.9 and corresponding return coef¬cient of 0.14,equal to the

estimate in the 48-98 sample.

Going in the other direction, statistical uncertainty, the recent runup in stocks despite low

dividend yields, and the dramatic portfolio implications of time-varying returns for investors

whose risks or risk aversion do not change over time all lead one to consider lower pre-

dictability. As we see from these calculations though, there are only two ways to make sense

of lower predictability. You could follow the “new economy” advocates, and believe that this

time, prices really are rising on advance news of dividend growth, even though prices have

not forecast dividend growth in the past. If not, you have to believe that dividend price ratios

are substantially more persistent than they have seemed in the postwar data.

Much more persistent d/p is a tough road to follow, since D/P ratios already move incred-

ibly slowly. Now, they basically change sign once a generation; high in the 50™s, low in the

60™s, high in the mid-70™s, and decreasing ever since (see Figure 37.) As a quantitative ex-

ample, suppose the D/P ratio had an AR(1) coef¬cient of 0.96 in annual data. This means a

half life of ln 0.5/ ln 0.96 = 17 years. In this case, the price coef¬cient would be coef¬cient

371

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

would be

1’b 1 ’ 0.96

= =1

D/P 0.04

and the return coef¬cient would be

1 ’ 0.962

1 ’ ρb

= ≈2

D/P 0.04

A one percentage point higher d/p ratio means that prices must rise 1 percentage point next

year, so returns must be about 2 percentage points higher. A two for one movement of

expected returns with the dividend yield thus seems about the lower bound for return pre-

dictability, so long as dividend growth remains unforecastable.

Persistence, price volatility and expected returns

From the dividend-price ratio equation (20.314) we can ¬nd the volatility of the dividend

price ratio and related it to the volatility and persistence of expected returns.

1

σ(dt ’ pt ) = σ (xt )

1 ’ ρb

With b = 0.9, 1/(1 ’ ρb) = 1/(1 ’ 0.96 — 0.9) = 7. 4. Thus, the high persistence of

expected returns means that a small expected return variation translates into a potentially

very large price variation; or equivalently that very large price variations, unaccounted for

by forecasts of dividend variation, can be explained by small variation in expected returns.

Translating to levels, a one percentage point change in expected returns with persistence

b = 0.9 corresponds to a 7.4% increase in price.

The Gordon growth model is a classic and even simpler way to see this point. With

constant dividend growth g and return r, the present value identity becomes

D

.

P=

r’g

A price-dividend ratio of 25 means r ’ g = 0.04. Then, a one percentage point permanent

change in expected return translates into a 25 percentage point change in price! This is an

overstatement, since expected returns are not this persistent, but it allows you to clearly see

the point.

This point also shows that small market imperfections in expected returns can translate

into substantial market imperfections in prices, if those expected return changes are persis-

tent. We know markets cannot be perfectly ef¬cient (Grossman and Stiglitz 1980). If they

were perfectly ef¬cient, there would be no traders around to make them ef¬cient. Especially

in situations where short sales or arbitrage are constrained by market frictions, prices of sim-

ilar assets can be substantially different, while the expected returns of those assets are almost

the same. For example the “closed end fund” puzzle (Thompson 1978) noted that baskets

of securities sold for substantial price discounts relative to the sum of the individual securi-

372

SECTION 20.1 TIME-SERIES PREDICTABILITY

ties. However, these price differentials persist for a long time. You can™t short the closed end

funds to buy the securities and keep that short position on for years.

20.1.4 Mean-reversion

I introduce long-horizon return regressions and variance ratios. I show that they are re-

lated: each one picks up a string of small negative return autocorrelations. I show though that

the direct evidence for mean reversion and Sharpe ratios that rise with horizon is weak.

Long run regressions and variance ratios

The ¬rst evidence of long-run forecastability in the stock market did not come from d/p

regressions, but rather from clever ways of looking at the long-run univariate properties of

returns. Fama and French (1988a) ran regressions of long-horizon returns on past long-

horizon returns,

(324)

rt’t+k = a + bk rt’k’t + µt+k ,

basically updating classic autocorrelation tests from the 60s to long horizon data. They found

negative and signi¬cant b coef¬cients: a string of good past returns forecasts bad future

returns.

Poterba and Summers (1988) considered a related “variance ratio” statistic. If stock re-

turns are i.i.d., then the variance of long horizon returns should grow with the horizon

(325)

var(rt’t+k ) = var(rt+1 + rt+2 + .. + rt+k ) = kvar(rt+1 ).

They computed the variance ratio statistic

1 var(rt’t+k )

vk = .

k var(rt+1 )

They found variance ratios below one. Stocks, it would seem, really are safer for “long-run

investors” who can “afford to wait out the ups and downs of the market,” common Wall Street

advice, long maligned by academics.

These two statistics are closely related, and reveal the same basic fact: stock returns have a

string of small negative autocorrelations. To see this relation, write the variance ratio statistic

³P ´

k

1 var j=1 rt+j

k k

X |k ’ j| X |k ’ j|

(326)

vk = = ρj = 1 + 2 ρj ,

k var(rt+1 ) k k

j=1

j=’k

373

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

and the regression coef¬cient in (20.324)

«

k k

X X

1

cov rt’j+1

bk = rt+j ,

var(rt’t+k ) j=1 j=1

k k

k var(rt+1 ) X |k ’ j| 1 X |k ’ j|

= ρk+j = ρk+j .

var(rt’t+k ) k vk k

j=’k j=’k

Both statistics are based on tent-shaped sums of autocorrelations, as illustrated by Figure

39. If there are many small negative autocorrelations which bring returns back slowly after

a shock, these autocorrelations might be individually insigni¬cant. Their sum might be eco-

nomically and statistically signi¬cant, however, and these two statistics will reveal that fact

by focusing on the sum of autocorrelations. The long-horizon regression weights empha-

size the middle of the autocorrelation function, so a k year horizon long-horizon regression

is comparable to a somewhat longer variance ratio.

Variance ratio weights

Long horizon regression weights

Return autocorrelations

Figure 39. Long horizon regression and variance ratio weights on autocorrelations.

Moving average representation and mean reversion

The “mean-reversion” description of these statistics comes from their implications for

where values go at long horizons following a shock. We can show that the square root of

the variance ratio measures the long-horizon impact of a shock relative to its instantaneous

impact “ the extent to which values revert back towards their mean following a shock.

You can always write returns as a moving average of their own shocks. From a regression

of returns on past returns

(327)

a(L)rt = µt

374

SECTION 20.1 TIME-SERIES PREDICTABILITY

you can ¬nd the θj in

∞

X

θj µt’j = θ(L)µt = a(L)’1 µt .

rt =

j=0

(Most simply, just simulate (20.327) forward.) The θj are the moving average representation

or impulse-response function “ they tell you the path of expected returns following a shock.

Let vt represent the cumulative returns, or the log value of a dollar invested, ∆vt = rt .

P

Then, the partial sum k θj tells you the effect on invested wealth vt+k of a univariate

j=1

return shock µt

Relating variance ratios, long-horizon regressions and moving averages for ¬nite k is pos-

sible but not pretty. However, we can nicely relate the limiting response “ where limk’∞ Et vt+k

ends up after a shock “ to the autocorrelations, and thus to the limit of the variance ratio statis-

tic very simply as

« 2

∞ ∞

X X

ρj = θ j /σ2 . (328)

1+2 µ

j=1 j=0

If returns are i.i.d., the variance ratio is one at all horizons; all autocorrelations are zero,

and all θ past the ¬rst are zero so the long-run price moves one for one with the shock.

A longP string of small negative autocorrelations means a variance ratio less than one, and

means ∞ θj < 1 so the long-run effect on price is lower than the impact effect - this is

j=0

“mean-reversion.”

The right hand equality of (20.328) follows by just taking the k ’ ∞ in (20.326). For the

second equality, you can recognize in both expressions the spectral density of r at frequency

zero. (Cochrane 1986 discusses these and other properties of variance ratios.)

Numbers

Table A1 presents an estimate of the variance of long-horizon returns and long-horizon

return regressions. The long-horizon regressions do show some interesting mean reversion,

especially in the 3-5 year range. However, that turns around at year 7 and disappears by year

10. The variance ratios do show some long-horizon stabilization. At year 10, the variance

ratio is (16.3/19.8)2 = 0.68, and the long-run price impact of a shock is 16.8/19.8 = 0.85.

The mean log return grows linearly with horizon whether returns are autocorrelated or not

“ E(r1 + r2 ) = 2E(r). If the variance also grows linearly with the horizon, as it does for

non-autocorrelated returns, then the Sharpe ratio grows with the square root of horizon. If the

variance grows more slowly than horizon, then the Sharpe ratio grows faster than the square

root of the horizon. This is the fundamental question for whether stocks are (unconditionally)

“safer for the long run.” Table A1 includes the long-horizon Sharpe ratios, and you can see

that they do increase.

375

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

logs, 1926-1996. 1 2 3 5 7 10

√

19.8 20.6 19.7 18.2 16.5 16.3

σ (rk ) / k

0.08 -0.15 -0.22 -0.04 0.24 0.08

βk √

Sharpe/ k 0.31 0.30 0.30 0.31 0.36 0.39

Table A1. Mean reversion using logs, 1926-1996. r denotes the difference be-

tween the log value weighted NYSE return and the log treasury bill return. σ(rk ) =

σ(rt’t+k ) is the variance of long-horizon returns. β k is the regression coef¬cient

in rt’t+k = ± + β k rt’k’t + µt+k . The Sharpe ratio is E(rt’t+k )/σ(rt’t+k )

You would not be to blame if you thought that the evidence of Table A1 was rather weak,

especially compared with the dramatic dividend/price regressions. It is, and it is for this

reason that most current evidence for predictability focuses on other variables such as the d/p

ratio.

In addition, Table A2 shows that the change from log returns to levels of returns, while

having a small effect on long-horizon regressions, destroys any evidence for higher Sharpe

ratios at long horizons. Table A3 shows the same results in the postwar period. Some of the

negative long-horizon regression coef¬cients are negative and signi¬cant, but there are just

as large positive coef¬cients, and no clear pattern. The variance ratios are ¬‚at or even rising

with horizons, and the Sharpe ratios are ¬‚at or even declining with horizon.

1926-1996 levels 1 2 3 5 7 10

√

20.6 22.3 22.5 24.9 28.9 39.5

σ (rk ) / k

0.02 -0.21 -0.22 -0.03 0.22 -0.63

βk √

Sharpe/ k 0.41 0.41 0.41 0.40 0.40 0.38

Table A2. r denotes the difference between the gross (not log) long-horizon value-

weighted NYSE return and the gross treasury bill return.

1947-1996 logs 1 2 3 5 7 10

√

15.6 14.9 13.0 13.9 15.0 15.6

σ (rk ) / k

-0.10 -0.29* 0.30* 0.30 0.17 -0.18

βk √

Sharpe/ k 0.44 0.46 0.51 0.46 0.41 0.36

1947-1996 levels 1 2 3 5 7 10

√

17.1 17.9 16.8 21.9 29.3 39.8

σ (rk ) / k

-0.13 -0.33* 0.30 0.25 0.13 -0.25

βk √

Sharpe/ k 0.50 0.51 0.55 0.48 0.41 0.37

Table A3. Mean-reversion in postwar data.

In sum, the direct evidence for mean-reversion in index returns seems quite weak. I

consider next whether indirect evidence, values of these statistics implied by other estimation

376

SECTION 20.1 TIME-SERIES PREDICTABILITY

techniques, still indicate mean-reversion. (The mean-reversion of individual stock returns

as examined by Fama and French (1988a) is somewhat stronger, and results in the stronger

cross-sectional “reversal” effect described in section 2.5 below.)

Keep in mind also that the unconditional Sharpe ratio does not in the end, drive investment

decisions. Investment decisions are driven by the conditional moments of asset returns at any

moment in time, using every information variable that there is.

20.1.5 Mean-reversion and forecastability20.335

I reconcile large forecastability from d/p ratios with a small mean reversion. I calculate

the univariate return process implied by the simple VAR, and ¬nd that it displays little mean

reversion.

I show that if dividend shocks are uncorrelated with expected return shocks, there must

be some mean reversion. If one rules out the small positive correlation in our samples, one

gets a slightly higher estimate of univariate mean-reversion.

I tie the strong negative correlation between return and d/p shocks to an essentially zero

correlation between expected return and dividend growth shocks.

How is it possible that variables such as the dividend price ratio forecast returns strongly,

but there seems to be little evidence for mean reversion in stock returns? To answer this

question, we have to connect the d/p regressions and the mean-reversion statistics.

Forecastability from variables such as the dividend-price ratios is related to, but does not

necessarily imply mean-reversion. (Campbell 1991 emphasizes this point.) Mean-reversion

is about the univariate properties of the return series, forecasts of rt+j based on {rt , rt’1 , rt’2 ...}.

Predictability is about the multivariate properties, forecasts of rt+j based on {xt , xt’1 , xt’2 , ...}

as well as {rt , rt’1 , rt’2 ...}. Variables xt can forecast rt+1 , while {rt’j } fail to forecast

rt+1 . As a simple example, suppose that returns are i.i.d., but you get to see tomorrow™s

newspaper. You forecast returns with a variable xt = rt+1 ,

rt+1 = xt

xt+1 = δ t+1 .

In this example, xt forecasts returns very well, but lagged returns do not forecast returns at

all.

To examine this issue, continue with the VAR representation built up from a slowly mov-

ing expected return and unforecastable dividends, (20.311)-(20.317). We want to ¬nd the

univariate return process implied by this VAR: what would happen if you took in¬nite data

from the system and ran a regression of returns on lagged returns? The answer, derived below,

377

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

is of the form

1 ’ γL

(329)

rt = νt.

1 ’ bL

This is just the kind of process that can display slow mean-reversion or momentum. The

moving average coef¬cients are

(330)

rt = ν t ’ (γ ’ b)ν t’1 ’ b(γ ’ b)ν t’2 ’ b2 (γ ’ b)ν t’3 ’ b3 (γ ’ b)ν t’4 ’ ...

Thus, if γ > b, a positive return shock sets off a long string of small negative returns, which

cumulatively bring the value back towards where it started. If γ < b, a positive shock sets off

a string of small positive returns, which add “momentum” to the original increase in value.

The long-run statistics are

« 2

µ ¶2

∞

X X 1’γ

ρj = θj /σ2 (ν t ) =

1+2 .

1’b

j=0

Thus, if γ > b, returns will have a variance ratio below one, and if γ < b a variance ratio

above one.

Now, what value of γ does our VAR predict? Is there a sensible structure of the VAR that

generates substantial predictability but little mean-reversion? The general formula, derived

below, is that γ solves

¡ ¢

1 + b2 σ2 (µd ) + (1 + ρ2 )σ2 (µdp ) ’ 2(ρ + b)σ(µd , µdp )

1 + γ2

(331)

= = 2q,

bσ2 (µd ) + ρσ 2 (µdp ) ’ (ρ + b)σ(µd , µdp )

γ

and hence,

p

q 2 ’ 1.

γ=q’

Case 1: No predictability.

If returns are not predictable in this system; if σ(δ) = 0 so σ(µdp ) = 0; then (20.331)

specializes to

1 + γ2 1 + b2

= .

γ b

γ = b, so returns are not autocorrelated. Sensibly enough.

Case 2: Constant dividend growth.

Next, suppose that the case that dividend growth is constant; σ(µd ) = 0 and variation in

expected returns is the only reason that returns vary at all. In this case, (20.331) specializes

378

SECTION 20.1 TIME-SERIES PREDICTABILITY

quickly to

1 + γ2 1 + ρ2

= ,

γ ρ

and thus γ = ρ.

This is a substantial amount of mean reversion. (γ ’ b) in (20.330) is then 0.96 ’ 0.90 =

0.06, so that each year j after a shock returns come back by 6 — bj percent of the original

shock. The cumulative impact is that value ends up at (1 ’ γ)/(1 ’ b) = (1 ’ 0.96)/(1 ’ 0.9)

= 0.4 or only 40% of the original shock.

Case 3: Dividend growth uncorrelated with expected return shocks.

Pure variation in expected returns is of course not realistic. Dividends do vary. If we add

dividend growth uncorrelated with expected return shocks “ with σ(µdp , µd ) = 0“ (20.331)

specializes to

1 + γ2 1 + b2 bσ2 (µd ) 1 + ρ2 ρσ2 (µdp )

(332)

= + = 2q

b bσ2 (µd ) + ρσ2 (µdp ) ρ bσ2 (µd ) + ρσ2 (µdp )

γ

In this case, b < γ < ρ. There will be some mean reversion in returns “ this model cannot

generate γ ¤ b. However, the mean reversion in returns will be lower than with no dividend

growth, because dividend growth obscures the information in ex-post returns about time-

varying expected returns. (See (20.333).) How much lower depends on the parameters.

Using the parameters (20.321), I ¬nd that (20.332) implies

p

γ = q ’ q 2 ’ 1 = 0.928.

Our baseline VAR with no correlation between dividend growth and expected return

shocks thus generates a univariate return process that is slightly on the mean-reversion edge

of uncorrelated. The long-run response to a shock is

1’γ 1 ’ 0.928

= = 0.72

1’b 1 ’ 0.9

This is a lot less mean-reversion than 0.4, but still somewhat more mean reversion than we

see in Tables A1-A3.

This case is an important baseline worth stressing. If expected returns are positively

correlated, realized returns are negatively autocorrelated. If (unchanged) expected dividends

are discounted at a higher rate, today™s price falls. You can see this most easily by just looking

at the return or its linearization, (20.319)

(333)

rt+1 = ∆dt+1 ’ ρ(dt+1 ’ pt+1 ) + (dt ’ pt ).

The d ’ p ratio is proportional to expected returns. A positive shock to expected returns,

uncorrelated with dividend growth, lowers actual returns. A little more deeply, look at the

379

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

return innovation identity (20.308),

®

∞ ∞

X X

rt ’ Et’1 rt = (Et ’ Et’1 ) ° ρj rt+j » . (334)

ρj ∆dt+j ’

j=0 j=1

P

If expected returns (Et ’ Et’1 ) ∞ ρj rt+j increase, with no concurrent news about cur-

j=1

rent or future dividends, then rt ’ Et’1 rt decreases.

This is the point to remark on a curious feature of the return - dividend/price VAR; the

negative correlation between ex-post return shocks and dividend/price ratio shocks. All the

estimates were around -0.7. At ¬rst glance such a strong correlation between VAR residuals

seems strange. At second glance, it is expected. From (20.333) you can see that a positive

innovation to the dividend price ratio will correspond to a negative return innovation, unless a

striking dividend correlation gets in the way. More deeply, you can see the point in (20.334).

Quantitatively, from (20.315), the return shock is related to the dividend growth shock and

the expected return shock by

ρ

µr = µd ’ δ = µd ’ ρµdp

1 ’ ρb

Thus, a zero correlation between the underlying dividend growth and expected return shocks,

ρ(µd , δ) = 0 implies a negative covariance between return shocks and expected return shocks.

ρ

σ2 (δ)

σ(µr , δ) = ’

1 ’ ρb

The correlation is a perfect ’1 if there are no dividend growth shocks. At the parameters (??)

σ(µdp ) = 12.5, σ(µr ) = 15, we obtain

ρ σ(δ) σ(µdp ) 12.5

ρ(µr , δ) = ρ(µr , µdp ) = ’ = ’ρ = ’0.96 — = ’0.8.

1 ’ ρb σ(µ) σ(µ) 15

The slight 0.1 positive correlation between dividend growth and expected return shocks re-

sults (or, actually, results from) a slightly lower ’0.7 speci¬cation for the correlation of return

and d/p shocks.

The strong negative correlation between return shocks and expected return shocks, ex-

pected from a low correlation between dividend growth shocks and expected return shocks,

is crucial to the ¬nding that returns are not particularly correlated despite predictability. Con-

sider what would happen if the correlation ρ(µr , µdp ) = ρ(µr , δ) were zero. The expected

return xt is slow moving. If it is high now, it has been high for a while, and there has likely

been a series of good past returns. But it also will remain high for a while, leading to a pe-

riod of high future returns. This is “momentum,” positive return autocorrelation, the opposite

of mean-reversion.

Case 4: Dividend growth shocks positively correlated with expected return shocks

As we have seen, the VAR with no correlation between expected return and dividend

380

SECTION 20.1 TIME-SERIES PREDICTABILITY

growth shocks cannot deliver uncorrelated returns or positive “momentum” correlation pat-

terns. At best, volatile dividend growth can obscure an underlying negative correlation pat-

tern. However, looking at (20.333) or (20.334), you can see that adding dividend growth

shocks positively correlated with expected return shocks could give us uncorrelated or posi-

tively correlated returns.

The estimate in Table RR implied a slight positive correlation of dividend growth and

expected return shocks, ρµd δ = 0.14 in (20.323). If we use that estimate in (20.331), we

recover an estimate

1’γ

γ = 0.923; = 0.77

1’ρ

This γ is quite close to b = 0.9, and the small mean reversion is more closely consistent with

Tables A1-A3.

Recall that point estimates as in Table 1 actually showed that a high d/p ratio forecast

higher dividends “ the wrong sign. This point estimate means that shocks to the d/p ratio

and expected returns are positively correlated with shocks to expected dividend growth. If

you generalize the VAR to allow such shocks, along with a richer speci¬cation allowing

additional lags and variables, you ¬nd that VARs give point estimates with slight but very

small mean reversion. (See Cochrane 1994 for a plot. The estimated univariate process has

slight mean-reversion, with an impulse-response ending up at about 0.8 of its starting value,

and no different from the direct estimate.

Can we generate unforecastable returns in this system? To do so, we have to increase

¡ correlation between expected return shocks and dividend growth. Equating (20.331) to

the ¢

1 + b2 /b and solving for ρ(µd , µdp ), we obtain

(1 ’ ρb) (ρ ’ b) σ(µdp )

ρ(µd , µdp ) = = 0.51.

(1 ’ b)2 (ρ + b) σ(µd )

This is possible, but not likely. Any positive correlation between dividend growth and

expected return shocks strikes me as suspect. If anything, I would expect that since expected

returns rise in “bad times” when risk or risk aversion increases, we should see a positive shock

to expected returns associated with a negative shock to current or future dividend growth.

Similarly, if we are going to allow dividend price ratios to forecast dividend growth, a high

dividend price ratio should forecast lower dividends.

Tying together all these thoughts, I think it™s reasonable to impose zero dividend fore-

castability and zero correlation between dividend growth and expected return shocks. This

speci¬cation means that returns are really less forecastable than they seem in some samples.

As we have seen, b = 0.9 and no dividend forecastability means that the coef¬cient of return

on D/P is really about 3.4 rather than 5 or 6. This speci¬cation means that expected returns

really account for 100% rather than 130% of the price-dividend variance. However, it also

means that univariate mean reversion is slightly stronger than it seems in our sample.

381

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

This section started with the possibility that the implied mean reversion from a multivari-

ate system could be a lot larger than that revealed by direct estimates. Instead, we end up by

reconciling strong predictability and slight mean-reversion.

How to ¬nd the univariate return representation

To ¬nd the implied univariate representation, we have to ¬nd a representation

(335)

rt+1 = a(L)ν t

in which the a(L) is invertible. The Wold decomposition theorem tells us that there is a

unique moving invertible moving average representation in which the ν t are the one-step

ahead forecast error shocks, i.e. the errors in a regression model a(L)rt+1 = ν t+1 . Thus,

if you ¬nd any invertible moving average representation, you know you have the right one.

We can™t do it by simply manipulating the systems starting with (20.311), because they are

expressed in terms of multivariate shocks, errors in regressions that include x.

There are three fundamental representations of a time series: its Wold moving average

representation, its autocorrelation function, and its spectral density. To ¬nd the univariate

representation (20.335), you either calculate the autocorrelations E(rt rt’j ) from (20.311)

and then try to recognize what process has that autocorrelation pattern, or you calculate the

spectral density and try to recognize what process has that spectral density.

In our simple setup, we can write the return-d/p VAR (20.314)-(20.315) as

rt+1 = (1 ’ ρb) (dt ’ pt ) + (µdt+1 ’ ρµdpt+1 )

(dt+1 ’ pt ) = b(dt ’ pt ) + µdpt+1

Then, write returns as

(1 ’ ρb)

rt+1 = µdpt + (µdt+1 ’ ρµdpt+1 )

1 ’ bL

(1 ’ bL) rt+1 = (1 ’ ρb) µdpt + (µdt+1 ’ ρµdpt+1 ) ’ b (µdt ’ ρµdpt )

(20.336)

(1 ’ bL) rt+1 = (µdt+1 ’ ρµdpt+1 ) + (µdpt ’ bµdt )

Here, you can see that rt must follow an ARMA(1,1) with one root equal to b and the other

root to be determined. Write yt = (1 ’ bL)rt , and thus yt = (1 ’ γL)ν t . Then the

autocovariances of y from (20.336) are

¡ ¢

E(yt+1 ) = 1 + b2 σ2 (µd ) + (1 + ρ2 )σ2 (µdp ) ’ 2(ρ + b)σ(µd , µdp )

2

E(yt+1 yt ) = ’bσ 2 (µd ) ’ ρσ 2 (µdp ) ’ (ρ + b)σ(µd , µdp )

while yt = (1 ’ γL)ν t implies

¡ ¢

2

1 + γ 2 σ2

E(yt+1 ) = ν

E(yt+1 yt ) = ’γσ2 .

ν

382

SECTION 20.1 TIME-SERIES PREDICTABILITY

Hence, we can ¬nd γ from the condition

¡ ¢

1 + b2 σ2 (µd ) + (1 + ρ2 )σ2 (µdp ) ’ 2(ρ + b)σ(µd , µdp )

1 + γ2

= = 2q.

bσ2 (µd ) + ρσ2 (µdp ) ’ (ρ + b)σ(µd , µdp )

γ

The solution (the root less than one) is

p

q 2 ’ 1.

γ=q’

For more general processes, such as computations from an estimated VAR, it is better to

approach the problem via the spectral density. This approach allows you to construct the uni-

£ ¤0

variate representation directly without relying on cleverness. If you write yt = rt xt ,

the VAR is yt = A(L)· t . Then spectral density of returns Sr (z) is given by the top left ele-

ment of Sy (z) = A(z)E(··0 )A(z ’1 )0 with z = e’iω . Like the autocorrelation, the spectral

density is the same object whether it comes from the univariate or multivariate representation.

You can ¬nd the autocorrelations by (numerically) inverse-Fourier transforming the spectral

density. The autocorrelations and spectral densities are directly revealing: a string of small

negative autocorrelations or a dip in the spectral density near frequency zero correspond to

mean-reversion; positive autocorrelations or a spectral density higher at frequency zero than

elsewhere corresponds to momentum.

To ¬nd the univariate, invertible moving average representation from the spectral density,

you have to factor the spectral density Srr (z) = a(z)a(z) where a(z) is a polynomial with

roots outside the unit circle, a(z) = (1 ’ γ 1 z)(1 ’ γ 2 z)...γ i < 1. Then, since a(L) is

invertible, rt = a(L)µt σ2 = 1 is the univariate representation of the return process.

µ

20.1.6 Multivariate mean-reversion

I calculate the responses to multivariate rather than univariate shocks. In a multivari-

ate system you can isolate expected return shocks and dividend growth shocks. The price

response to expected return shocks is entirely stationary.

We are left with a troubling set of facts: high price/dividend ratios strongly forecast low

returns, yet high past returns do not seem to forecast low subsequent returns. Surely, there

must be some sense in which “high prices” forecast lower subsequent returns?

The resolution must involve dividends (or earnings, book value, or a similar divisor for

prices). A price rise with no change in dividends results in lower subsequent returns. A price

rise that comes with a dividend rise does not result in lower subsequent returns. A high return

combines dividend news and price-dividend news, and so obscures the lower expected return

message. In a more time-series language, instead of looking at the response to a univariate

return shock “ a return that was unanticipated based on lagged returns “ let us look at the

383

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

responses to multivariate shocks “ a return that was unanticipated based on lagged returns

and dividends.

This is easy to do in our simple VAR. We can simulate (20.314) -(20.317) forward and

trace the responses to a dividend growth shock and an expected return (d/p ratio) shock.

Figures 40 and 41 present the results of this calculation. (Cochrane 1994 presents a corre-

sponding calculation using an unrestricted VAR, and the results are very similar.)

Figure 40. Responses to a one standard deviation (1.7%) negative expected return shock

in the simple VAR.

Start with Figure 40. The negative expected return shock raises prices and the p-d ratio

immediately. We can identify such a shock in the data as a return shock with no contempo-

raneous movement in dividends. The p-d ratio then reverts to its mean. Dividends are not

forecastable, so they show no immediate or eventual response to the expected return shock.

It could be the case that prices move in advance of future dividends; if this were the case we

would see dividends rising to meet higher prices after a return shock. Instead, prices show a

long and complete reversion back to the level of dividends. This shock looks a lot like a neg-

ative yield shock to bonds: such a shock raises prices now so that bonds end up at the same

maturity value despite a smaller expected return.

The cumulative return “mean-reverts” even more than prices. For given prices, dividends

are now smaller (smaller d-p) so returns deviate from their mean by more than price growth.

384

SECTION 20.1 TIME-SERIES PREDICTABILITY

Figure 41. Responses to a one standard deviation (14%) dividend growth shock in the

simple VAR.

The cumulative return ends up below its previously expected value. Compare this value

response to the univariate value response, which we calculated above ends up at about 0.8 of

its time-1 response.

The dividend shock raises prices and cumulative returns immediately and proportionally

to dividends, so the price-dividend ratio does not change. Expected returns or the discount

rate, re¬‚ected in any slope of the value line, do not change. If the world were i.i.d., this is the

only kind of shock we would see, and dividend-price ratios would always be constant.

Figure (40) and (41) plot the responses to “typical,” one standard deviation shocks. Thus

you can see that actual returns are typically about half dividend shocks and half expected

return shocks. That is why returns alone are a poor indicator of expected returns.

In sum, at last we can see some rather dramatic “mean-reversion.” Good past returns

by themselves are not a reliable signal of lower subsequent returns, because they contain

substantial dividend growth noise. Good returns that do not include good dividends isolate

an expected return shock. This does signal low subsequent returns. It sets off a completely

transitory variation in prices.

385

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

20.1.7 Cointegration and short vs. long-run volatility

If d ’ p, ∆p and ∆d then the long-run variance of ∆d and ∆p must be the same, long-

run movements in d and p must be perfectly correlated, and d and p must end up in the

same place after any shock. Thus, the patterns of predictability, volatility, univariate and

multivariate mean-reversion really all just stem from these facts, the persistence of d ’ p and

the near-unforecastability of ∆d.

You might think that the facts about predictability depend on the exact structure of the

VAR, including parameter estimates. In fact, most of what we have learned about predictabil-

ity and mean reversion comes down to a few facts: the dividend-price ratio, returns, and div-

idend growth are all stationary; dividend growth is not (or at best weakly) forecastable, and

dividend growth varies less than returns.

These facts imply that the dividend and price responses to each shock are eventually equal

in Figures (40) and (41). If d ’ p, ∆p and ∆d are stationary, then d and p must end up in the

same place following a shock. The responses of a stationary variable (d ’ p) must die out.

If dividends are not forecastable, then it must be the case that prices do all the adjustment

following a price shock that does not affect dividends.

Stationary d ’ p, ∆p and ∆d also implies that the variance of long-horizon ∆p must

equal the variance of long-horizon ∆d.

1 1

(337)

lim var (pt+k ’ pt ) = lim var (dt+k ’ dt ) ,

k’∞ k k’∞ k

and the correlation of long-run price and dividend growth must approach one. These facts

follow from the fact that the variance ratio of a stationary variable must approach zero, and

d ’ p is stationary. Intuitively, long run price growth cannot be more volatile than long run

dividend growth, or the long-run p ’ d ratio would not be stationary.

Now, if dividend growth is not forecastable, its long run volatility is the same as its short

run volatility “ its variance ratio is one. Short run price growth is more volatile than short

run dividend growth, so we conclude that prices must be mean-reverting; their variance ratio

must be below one.

Quantitatively, this observation supports the magnitude of univariate mean reversion that

we have found so far. Dividend growth has a short run, and thus long-run, standard deviation

of about 10% per year, while returns and prices Thus, prices must have a long-run variance

p

ratio of about 2/3, or a long-run response to univariate shocks of 2/3 = 0.82.

The change in prices is not the same thing as the return, especially at long horizons,

since returns include the intervening dividends. One can address this question with a slightly

different accounting: de¬ne d as the dividend paid to a dollar investment. The resulting

dividend series is still not predictable and has roughly the same volatility, so in this case we

386

SECTION 20.1 TIME-SERIES PREDICTABILITY

get approximately the same result.

The work of Lettau and Ludvigson (2000) suggests that we may get much more dramatic

implications by including consumption data. The ratio of stock market values to consumption

should also be stationary; if wealth were to explode people would surely consume more

and vice versa. The ratio of dividends to aggregate consumption should also be stationary.

Consumption growth seems independent at all horizons, and consumption growth is very

stable, with roughly 1% annual standard deviation. For example, Lettau and Ludvigson 2000

¬nd that none of the variables that forecast returns in Table LL “ including d ’ p and a

consumption to wealth ratio “ forecast consumption growth at any horizon.

These facts suggest that aggregate dividends are forecastable by the consumption/dividend

ratio, and strongly so “ the long-run volatility of aggregate dividend growth must be the 1%

volatility of consumption growth, not the 10% short run volatility of dividend growth.

These facts also mean that almost all of the 15% or more variation in annual stock market

wealth must be transitory “ the long run volatility of stock market value must be no more

than the 1% consumption growth volatility!

Again, total market value is not the same thing as price, price is not the same thing as

cumulated return, and aggregate dividends are not the same thing as the dividend concept

we have used so far (dividends paid to a dollar investment with dividends consumed), or

dividends paid to a dollar investment with dividends reinvested. Lettau and Ludvigson show

that the consumption/wealth ratio does forecast returns, but noone has yet worked out the

mean-reversion implications of this fact.

My statements about the implications of stationary d ’ p, ∆d, ∆p, r are developed in de-

tail in Cochrane 1994. They are special cases of the representation theorems for cointegrated

variables developed by Engel and Granger (1987). A regression of a difference like ∆p on

a ratio like p ’ d is called the error-correction representation of a cointegrated system. Er-

ror correction regressions have subtly and dramatically changed almost all empirical work

in ¬nance and macroeconomics. The vast majority of the successful return forecasting re-

gressions in this section, both time-series and cross-section, are error-correction regressions

of one sort or another. Corporate ¬nance is being redone with regressions of growth rates

on ratios, as is macroeconomic forecasting. For example, the consumption/GDP ratio is a

powerful forecaster of GDP growth.

20.1.8 Bonds

The expectations model of the term structure works well on average and for horizons of 4

years or greater. At the one year horizon, however, a forward rate 1 percentage point higher

than the spot rate seems entirely to indicate a one percentage point higher expected excess

return rather than a one percentage point rise in future interest rates.

387

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

The venerable expectations model of the term structure speci¬es that long term bond

yields are equal to the average of expected future short term bond yields. As with the CAPM

and random walk, the expectations model was the workhorse of empirical ¬nance for a gen-

eration. And as with those other views, a new round of research has signi¬cantly modi¬ed

the traditional view.

Maturity Avg. Return Std. Std. dev.

(N) (N)

error

N E(hprt+1 ) σ(hprt+1 )

1 5.83 0.42 2.83

2 6.15 0.54 3.65

3 6.40 0.69 4.66

4 6.40 0.85 5.71

5 6.36 0.98 6.58

Table 4. Average continuously compounded (log) one-year holding period returns

on zero-coupon bonds of varying maturity. Annual data from CRSP 1953-1997.

Table 4 calculates the average return on bonds of different maturities. The expectations

hypothesis seems to do pretty well. Average holding period returns do not seem very different

across bond maturities, despite the increasing standard deviation of bond returns as maturity

rises. The small increase in returns for long term bonds, equivalent to a slight average upward

slope in the yield curve, is usually excused as a small “liquidity premium.” In fact, the curious

pattern in Table 4 is that bonds do not share the high Sharpe ratios of stocks. Whatever factors

account for the volatility of bond returns, they seem to have very small risk prices.

Table 4 is again a tip of an iceberg of an illustrious career for the expectations hypothesis.

Especially in times of great in¬‚ation and exchange rate instability, the expectations hypothesis

does a very good ¬rst-order job.

However, one can ask a more subtle question. Perhaps there are times when long term

bonds can be forecast to do better, and other times when short term bonds are expected to

do better. If the times even out, the unconditional averages in Table 4 will show no pattern.

Equivalently, we might want to check whether a forward rate that is unusually high forecasts

an unusual increase in spot rates.

388

SECTION 20.1 TIME-SERIES PREDICTABILITY

Change in yields Holding period returns

(1) (1) (N+1) (1)

yt+N ’ yt = hprt+1 ’ yt =

(N’N+1) (1) (N’N+1) (1)

= a + b(ft ’ yt ) + µt+N = a + b(ft ’ yt ) + µt+1

2 2

N a σ(a) b σ(b) R a σ(a) b σ(b) R

1 0.1 0.3 -0.10 0.36 -0.02 -0.1 0.3 1.10 0.36 0.16

2 -0.01 0.4 0.37 0.33 0.005 -0.5 0.5 1.46 0.44 0.19

3 -0.04 0.5 0.41 0.33 0.013 -0.4 0.8 1.30 0.54 0.10

4 -0.3 0.5 0.77 0.31 0.11 -0.5 1.0 1.31 0.63 0.07

Table 5. Forecasts based on forward-spot spread. OLS regressions 1953-1997 an-

nual data. Yields and returns in annual percentages. The left hand panel runs the

change in the one year yield on the forward-spot spread. The right hand panel runs

the one period excess return on the forward-spot spread.

Table 5 gets at these issues, updating Fama and Bliss™ (1986) classic regression tests.

(Campbell and Shiller 1991 and Campbell 1995 make the same point with regressions of

yield changes on yield spreads.) The left hand panel presents a regression of the change in

yields on the forward-spot spread. The expectations hypothesis predicts a coef¬cient of 1.0,

since the forward rate should equal the expected future spot rate. At a one-year horizon we

see instead coef¬cients near zero and a negative adjusted R2 . Forward rates one year out

seem to have no predictive power whatsoever for changes in the spot rate one year from now.

On the other hand, by 4 years out, we see coef¬cients within one standard error of 1.0. Thus,

the expectations hypothesis seems to do poorly at short (1 year) horizons, but much better at

longer horizons and on average (Table 4).

If the yield expression of the expectations hypothesis does not work at one year horizons,

then the expected return expression of the expectations hypothesis must not hold either “ one

must be able to forecast one year bond returns. To check this fact, the right hand panel of

Table 5 runs regressions of the one year excess return on long-term bonds on the forward-spot

spread. Here, the expectations hypothesis predicts a coef¬cient of zero: no signal (including

the forward-spot spread) should be able to tell you that this is a particularly good time for

long bonds vs. short bonds. As you can see, the coef¬cients in the right hand panel of Table

5 are all about 1.0. A high forward rate does not indicate that interest rates will be higher one

year from now; it seems entirely to indicate that you will earn that much more holding long

term bonds (The right hand panel is really not independent evidence, since the coef¬cients in

the right and left hand panels of Table 5 are mechanically linked. For example 1.14 + (-0.14)

= 1.0, and this holds as an accounting identity. Fama and Bliss call them “complementary

regressions.”)

Figures 42 and 43 provide a pictorial version of the results in Table 5. Suppose that the

yield curve is upward sloping as in the left panel. What does this mean? A naive investor

might think this pattern indicates that long-term bonds give a higher return than short term

bonds. The expectations hypothesis denies this conclusion. If the expectations hypothesis

were true, the forward rates plotted against maturity in the left hand panel would translate

389

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

one-for-one to the forecast of future spot rates in the right hand panel, as plotted in the line

marked “Expectations model.” Rises in future short rates should lower bond prices, cutting

off the one-period advantage of long-term bonds. The rising short rates would directly raise

the multi-year advantage of short term bonds.

We can calculate the actual forecast of future spot rates from the estimates in the left hand

panel of Table 5, and these are given by the line market “Estimates” in Figure 43. The essence

of the phenomenon is sluggish adjustment of the short rates. The short rates do eventually

rise to meet the forward rate forecasts, but not as quickly as the forward rates predict that

they should.

Figure 42. If the current yield curve is as plotted here....

As dividend growth should be forecastable so that returns are not forecastable, short-term

yields should be forecastable so that returns are not forecastable. In fact, yield changes are

almost unforecastable at a one year horizon, so, mechanically, bond returns are. We see this

directly in the ¬rst row of the left hand panel of Table 5 for the one-period yield. It is an

implication of the right hand panel as well. If

(N+1) (1) (N’N+1) (1)

(338)

hprt+1 ’ yt = 0 + 1(ft ’ yt ) + µt+1

390

SECTION 20.1 TIME-SERIES PREDICTABILITY

Figure 43. ...this is the forecast of future one year interest rates. The dashed line gives the

forecast from the expectations hypothesis. The solid line is constructed from the estimates in

Table 4.

then, writing out the de¬nition of holding period return and forward rate,

(N) (N+1) (1) (N) (N+1) (1)

(20.339)

pt+1 ’ pt + pt = 0 + 1(pt ’ pt + pt ) + µt+1

(N) (N)

pt+1 = 0 + 1(pt ) + µt+1

(N) (N)

yt+1 = 0 + 1(yt ) ’ µt+1 /N

A coef¬cient of 1.0 in (20.338) is equivalent to yields or bond prices that follow random

walks; yield changes that are completely unpredictable.

Of course yields are stationary and not totally unpredictable. However, they move slowly.

Thus, yield changes are very unpredictable at short horizons but much more predictable at

long horizons. That is why the coef¬cients in the right hand panel of Table 5 build with

horizon. If we did holding period return regressions at longer horizons, they would gradually

approach the expectations hypothesis result.

The roughly 1.0 coef¬cients in the right hand panel of Table 5 mean that a one percentage

point increase in forward rate translates into a one percentage point increase in expected

return. It seems that old fallacy of confusing bond yields with their expected returns also

contains a grain of truth, at least for the ¬rst year. However, the one-for-one variation of

expected returns with forward rates does not imply a one-for-one variation of expected returns

391

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

with yield spreads. Forward rates are related to the slope of the yield curve,

(N’N+1) (1) (N) (N+1) (1)

ft ’ yt = pt ’ pt ’ yt

(N) (N+1) (1)

= ’N yt + (N + 1)yt ’ yt

³ ´³ ´

(N+1) (N) (N+1) (1)

= N yt ’ yt + yt ’ yt

Thus, the forward-spot spread varies a more than the yield spread, so regression coef¬cients

of holding period yields on yield spreads give coef¬cients greater than one. Expected returns

move more than one-for-one with yield spreads. Campbell (1995) reports coef¬cients of

excess returns on yield spreads that rise from one at a 2 month horizon to 5 at a 5 year

horizon.

The facts are analogous to the dividend/price regression. There, dividends were essen-

tially unforecastable. This implied that a one percentage point change in dividend yield

implied a 5 percentage point change in expected excess returns.

Of course, there is risk: the R2 are all about 0.1-0.2, about the same values as the R2

from the dividend/price regression at a one year horizon, so this strategy will often go wrong.

Still, 0.1-0.2 is not zero, so the strategy does pay off more often than not, in violation of the

expectations hypothesis. Furthermore, the forward-spot spread is a slow moving variable,

typically reversing sign once per business cycle. Thus, the R2 build with horizon as with the

D/P regression, peaking in the 30% range (Fama and French 1989).

The fact that the regressions in Table 5 run the change in yield on the forward-spot spread

and the excess return on the forward-spot spread is very important. The overall level of

(N)

interest rates moves up and down a great deal but slowly over time. Thus, if you run yt+j =

(N+1)

+ µt+N , you will get a coef¬cient b almost exactly equal to 1.0 and a stupendous

a + bft

R , seemingly a stunning validation of the expectations hypothesis. If you run a regression

2

of tomorrow™s temperature in Chicago on today™s temperature, the regression coef¬cient will

be near 1.0 with a huge R2 as well, since the temperature varies a lot over the year. But

today™s temperature is not a useful temperature forecast. To measure a temperature forecast

we want to know if the forecast can predict the change in temperature. Is (forecast - today™s

temperature) a good measure of (tomorrow™s temperature - today™s temperature)? Table 5

runs this regression.

The decomposition in (20.339) warns us of one of several econometric traps in this kind

of regression. Notice that two of the three right hand variables are the same. Thus any mea-

(N+1) (1)

surement error in pt and pt will induce a spurious common movement in left and right

hand variables. In addition, since the variables are a triple difference, the difference may

eliminate a common signal and isolate measurement error or noise. There are pure mea-

surement errors in the bond data, and we seldom observe pure discount bonds of the exactly

desired maturity. In addition, various liquidity and microstructure effects can in¬‚uence the

yields of particular bonds in ways that are not exploitable for typical investors.

As an example of what this sort of “measurement error” can do, suppose all bond yields

392

SECTION 20.1 TIME-SERIES PREDICTABILITY

are 5%, but there is one “error” in the two period bond price at time 1 “ rather than being

-10 it is -15. The table below tracks the effects of this error. It implies a blip of the one year

forward rate in year one, and then a blip in the return from holding this bond from year one

to year two. The price and forward rate “error” automatically turns in to a subsequent return

when the “error” is corrected. If the price is real, of course, this is just the kind of event we

want the regression to tell us about “ the forward rate did not correspond to a change in future

spot rate, so there was a large return; it was a price that was “out of line” and if you could

trade on it, you should. But the regression will also pounce on measurement error in prices

and indicate spurious returns.

0 1 2 3

t

(1)

-5 -5 -5 -5

pt

(2)

-10 -10 -10

-15

pt

(3)

-15 -15 -15 -15

pt

(i)

5 5 5 5

yt , i 6= 2

(2)

5 7.5 5 5

yt

(1’2)

5 10 5 5

ft

(1’2) (1)

0 5 0 0

ft ’ yt

(2’1) (1)

0 0 5 0

hprt ’ yt

Numerical example of the effect of measurement error in yields on yield regressions.

20.1.9 Foreign exchange

The expectations model works well on average. However, a foreign interest rate one

percentage point higher than its usual differential with the US rate (equivalently, a one per-

centage point higher forward-spot spread) seems to indicate even more than one percentage

point expected excess return; a further appreciation of the foreign currency.

Suppose interest rates are higher in Germany than in the U.S. Does this mean that one can

earn more money by investing in German bonds? There are several reasons that the answer

might be no. First, of course is default risk. While not a big problem for German government

bonds, Russia and other governments have defaulted on bonds in the past and may do so

again. Second, and more important, is the risk of devaluation. If German interest rates are

10%, US interest rates are 5%, but the Euro falls 5% relative to the dollar during the year,

you make no more money holding the German bonds despite their attractive interest rate.

Since lots of investors are making this calculation, it is natural to conclude that an interest

rate differential across countries on bonds of similar credit risk should reveal an expectation

of currency devaluation. The logic is exactly the same as the “expectations hypothesis” in

393

CHAPTER 20 EXPECTED RETURNS IN THE TIME-SERIES AND CROSS-SECTION

the term structure. Initially attractive yield or interest rate differentials should be met by an

offsetting event so that you make no more money on average in one country or another, or in

one currency versus another. As with bonds, the expectations hypothesis is slightly different

from pure risk neutrality since the expectation of the log is not the log of the expectation.

Again, the size of the phenomena we study usually swamps this distinction.

As with the expectations hypothesis in the term structure, the expected depreciation view

ruled for many years, and still constitutes an important ¬rst-order understanding of interest

rate differentials and exchange rates. For example, interest rates in east Asian currencies

were very high on the eve of the currency collapses of 1997, and many banks were making

tidy sums borrowing at 5% in dollars to lend at 20% in local currencies. This situation

should lead one to suspect that traders expect a 15% devaluation, and most likely a small

chance of a larger devaluation. That is, in this case, exactly what happened. Many observers

and policy analysts who ought to know better often attribute high nominal interest rates in

troubled countries to “tight monetary policy” that is “strangling the economy” to “defend the

currency.” In fact, one™s ¬rst order guess should be that such high nominal rates re¬‚ect a

large probability of in¬‚ation and devaluation “ loose monetary and ¬scal policy “ and that

they correspond to much lower real rates.