. 14
( 17)


As you can see, the limits in the right hand sides of (20.309) and (20.310) are zero if the
price-dividend ratio is stationary, or even bounded. For these terms not to be zero, the price
dividend ratio must be expected to grow explosively, and faster than R or ρ’1 . Especially
in the linearized form 20.310 you can see that stationary r, ∆d and implies stationary p ’
d if the last term is zero, and p ’ d is not stationary if the last term is not zero. Thus,
you might want to rule out these terms just based on the view that price dividend ratios
do not and are not expected to explode in this way. You can also invoke economic theory
to rule them out. The last terms must be zero in an equilibrium of in¬nitely lived agents
or altruistically linked generations. If wealth explodes, optimizing long-lived agents will
consume more. Technically, this limiting condition is a ¬rst order condition for optimality
just like the period to period ¬rst order condition. The presence of the last term also presents
an arbitrage opportunity in complete markets, as you can short a security whose price contains
the last term, buy the dividends separately and eat the difference right away.
On the other hand, there are economic theories that permit the limiting terms “ overlap-
ping generations models, and they capture the interesting possibility of “rational bubbles”
that many observers think they see in markets, and that have sparked a huge literature and a
lot of controversy.
An investor holds a security with a rational bubble not for any dividends, but on the
expectation that someone else will pay even more for that security in the future. This does
seem to capture the psychology of investors from the tulip bubble of 17th century Holland
to the dot-com bubble of the millenial United States “ why else would anyone buy Cisco
systems at a price-earnings ratio of 217 and market capitalization 10 times that of General
Motors in early 2000?
A “rational bubble” imposes a little discipline on this centuries old psychological descrip-
tion, however, by insisting that the person who is expected to buy the security in the future
also makes the same calculation. He must expect the price to rise even further. Continuing re-
cursively, the price of a rational bubble must be expected to rise forever. A Ponzi scheme, in
which everyone knows the game will end at some time, cannot rationally get off the ground.
The expectation that prices will grow at more than a required rate of return forever does not


mean that sample paths do so. For example, consider the bubble process
( )
γRPt prob = γPt R’1
Pt R’1
Pt+1 = .
prob = γPtt R
1 R’1

Figure 38 plots a realization of this process with γ = 1.2. This process yields an expected
return R, and the dashed line graphs this expectation as of the ¬rst date. Its price is positive
though it never pays dividends. It repeatedly grows with a high return γR for a while and
then bursts back to one. The expected price always grows, though almost all sample paths do
not do so.

Figure 38. Sample path from a simple bubble process. The solid line gives the bubble. The
dashed line gives the expected value of the bubble as of time zero, i.e. pRt .

In¬nity is a long time. It™s really hard to believe that prices will rise forever. The solar
system will end at some point; any look at the geological and evolutionary history of the earth
suggests that our species will be around a lot less than that. Thus, the in¬nity in the bubble
must really be a parable for “a really long time.” But then the “rational” part of the bubble
pops “ it must hinge on the expectation that someone will be around to hold the bag; to buy
a security without the expectation of dividends or further price increases. (The forever part
of usual present value formulas is not similarly worrying because 99.99% of the value comes
from the ¬rst few hundred years of dividends.)


Empirically, bubbles do not appear to be the reason for historical price-dividend ratio
variation. First, price-dividend ratios do seem stationary. (Craine 1993 runs a unit root test
with this conclusion.) Even if statistical tests are not decisive, as is expected for a slow
moving series, or a series such as that plotted in Figure 38, it is hard to believe that price-
dividend ratios can explode rather than revert back to their four-century average level of
about 20 to 25. Second, Table 2 shows that return and dividend forecastability terms add up
to 100% of the variance of price-dividend ratios. In a bubble, we would expect price variation
not matched by any variation in expected returns or dividends, as is the case in Figure 38.
I close with a warning: The word “bubble” is widely used to mean very different things.
Some people seem to mean any large movement in prices. Others mean large movements in
prices that do correspond to low or perhaps negative expected excess returns (I think this is
what Shiller 2000 has in mind), i.e. any price movement not explained by a present value
model with constant expected returns.

20.1.3 A simple model for digesting predictability

To unite the various predictability and return observations, I construct a simple VAR
representation for returns, price growth, dividend growth, dividend price ratio. I start only
with a slow moving expected return and unforecastable dividends.
This speci¬cation implies that d/p ratios reveal expected returns.
This speci¬cation implies return forecastability. To believe in a lower predictability of
returns, you must either believe that dividend growth really is predictable, or that the d/p
ratio is really much more persistent than it appears to be.
This speci¬cation shows that small but persistent changes in expected returns add up to
large price changes.

We have isolated two important features of the long-horizon forecast phenomenon: div-
idend/price ratios are highly persistent, and dividend growth is essentially unforecastable.
Starting with these two facts, a simple VAR representation can tie together many the pre-
dictability and volatility phenomena.
Start by specifying a slow-moving state variable xt that drives expected returns, and un-
forecastable dividend growth,

xt = bxt’1 + δ t
rt+1 = xt + µrt+1
∆dt+1 = µdt+1

All variables are demeaned logs. (The term structure models of Chapter 19 were of this


From this speci¬cation, using the linearized present value identity and return, we can
derive a VAR representation for prices, returns, dividends, and the dividend price ratio,

δ t+1
(dt+1 ’ pt+1 ) = b(dt ’ pt ) +
1 ’ ρb
µ ¶
rt+1 = (1 ’ ρb) (dt ’ pt ) + µdt+1 ’ δ t+1
1 ’ ρb
µ ¶
∆pt+1 = (1 ’ b) (dt ’ pt ) + µdt+1 ’ δ t+1
1 ’ ρb
∆dt+1 = µdt+1

Dividend-price ratio: Using the approximate present value identity, we can ¬nd the divi-
dend price ratio

X xt
ρj’1 (Et rt+1 ’ Et dt+j ) = (318)
dt ’ pt = Et .
1 ’ ρb

This equation makes precise my comments that the dividend price ratio reveals expected
returns. Obviously, the feature that the dividend price ratio is exactly proportional to the ex-
pected return does not generalize. If dividend growth is also forecastable, then the dividend-
price ratio is a combination of dividend growth and return forecasts. Actual return forecasting
exercises can often bene¬t from cleaning up the dividend price ratio to focus on the implied
return forecast.
Returns: Since we know where the dividend/price ratio and dividends are going, we can
¬gure out where returns are going. Use the return linearization (this is equivalent to (20.304))
µ ¶
Pt+1 Dt+1 Pt
Rt+1 = 1+ /
Dt+1 Dt Dt
rt+1 = ρ(pt+1 ’ dt+1 ) + (dt+1 ’ dt ) ’ (pt ’ dt ).

Now, plug in the from (20.314) and (20.313) to get (20.315).
Prices: Write

pt+1 ’ pt = ’(dt+1 ’ pt+1 ) + (dt ’ pt ) + (dt+1 ’ dt ).

Then, plugging in from (20.314) and (20.313), we get (20.316).
We can back out parameters from the reduced form return - d/p VAR. (Any two equations
carry all the information of this system.) Table RR presents some estimates.


Sample a a, D/P b σ(µr ) σ(µdp ) ρ(µr , µdp )
27-98 0.16 4.7 0.92 19.2 15.2 -0.72
48-98 0.14 4.0 0.97 15.0 12.6 -0.71
27-92 0.28 6.7 0.82 19.0 15.0 -0.69
48-92 0.27 6.2 0.87 14.5 12.4 -0.67

Table RR. Estimates of log excess return and log dividend-price ratio regressions,
using annual CRSP data. r is the difference between the log value weighted return
and the log treasury bill rate. The estimates are of the system

rt+1 = a(dt ’ pt ) + µrt+1
dt+1 ’ pt+1 = b(dt ’ pt ) + µdp,t+1


rt+1 = (a, D/P ) + µt+1

I report both the more intuitive coef¬cients on the actual d/p ratio and the coef¬cients on
the log d/p ratio, which is a more useful speci¬cation for our transformations. The two line
up; a coef¬cient of 5 on Dt /Pt implies a coef¬cient of 5—D/P ≈ 0.25 on (Dt /Pt ) /(D/P ).
You can see that the parameters depend substantially on the sample. In particular, the
dramatic returns of the late 1990s, despite low dividend yields, cut the postwar return forecast
coef¬cients in half and the overall sample estimate by about one third. That dramatic decline
in the d/p ratio also induces a very high apparent persistence in the d/p ratio, rising to a 0.97
estimate in the 48-98 sample. (Faced with an apparent trend in the data, an autoregression
estimates a root near unity.)
With these estimates in mind, given the considerations outlined below, I will make calcu-
lations using reduced form parameters

b = 0.9
ρ = 0.96
σ(µr ) = 15
σ(µdp ) = 12.5
ρr,dp = ’0.7

From these parameters, we can ¬nd the underlying parameters of (20.311)-(20.313). I com-


ment on each one below as it becomes useful.

σ(δ) = σ(µdp )(1 ’ ρb) = 1.7
σ(µd ) = σ(µr + ρµdp ) = σ2 (µr ) + ρ2 σ 2 (µdp ) + 2ρσ(µr , µdp ) = 10.82
σ(µd , µdp ) = σ(µr µdp ) + ρσ 2 (µdp )
ρ(µr µdp )σ(µr ) + ρσ(µdp )
ρ(µd , µdp ) = = 0.139
σ(µd )

The size of the return forecasting coef¬cient.
Does the magnitude of the estimated predictability make sense? Given the statistical un-
certainties, do other facts guide us to higher or lower predictability?
The coef¬cient of the one year excess return on the dividend price ratio in Table 1 is about
5, and the estimates in Table RR vary from 4 to 6 depending on the sample. These values are
surprisingly large. For example, a naive investor might think that dividend yields move one-
for-one with returns; if they pay more dividends, you get more money. Before predictability,
we would have explained that high dividend yield means that prices are low in anticipation
of lower future dividends, leaving the expected return unchanged. Now we recognize the
possibility of time-varying expected returns, but does it make sense that expected returns
move even more than dividend yields?
Return forecastability follows from the fact that dividends are not forecastable, and that
the dividend/price ratio is highly but not completely persistent. We see this in the calculated
coef¬cients of prices and returns on the dividend price ratio in (20.315) and (20.316). We

rt+1 = (1 ’ ρb) (dt ’ pt ) + µrt+1
∆pt+1 = (1 ’ b) (dt ’ pt ) + µpt+1

Since dividends are not forecastable, it is no surprise that the formulas for price growth and
return are so similar. The return formula basically just adjusts for the fact that a higher
dividend yield directly contributes to return by paying more dividends. To transform units to
regressions on D/P, multiply by 25, e.g.
1 ’ ρb Dt
rt+1 = + µrt+1 .
D/P Pt

Suppose the d/p ratio were not persistent at all“b = 0. Then both return and price growth
coef¬cients should be 1 in logs or about 25 in levels! If the d/p ratio is one percentage point
above its average, we must forecast enough of a rise in prices to restore the d/p ratio to its av-
erage in one year. The average d/p ratio is about 4%, though, so prices and hence returns must
rise by 25% to change the d/p ratio by one percentage point. d(D/P ) = ’D/P d(P )/P .
Suppose instead that the d/p ratio were completely persistent i.e. a random walk with


b = 1. Then the return coef¬cient is 1 ’ ρ = 0.04, and about 1.0 in levels, while the price
coef¬cient is 0. If the d/p ratio is one percent above average and expected to stay there, and
dividends are not forecastable, then prices must not be forecast to change either. The return
is one percentage point higher, because you get the higher dividends as well. Thus, the naive
investor who expects dividend yield to move one for one with returns not only implicitly
assumes that dividends are not forecastable “ which turns out to be true “ but also that the d/p
ratio will stay put forever.
A persistence parameter b = 0.90 implies price and return regression coef¬cients of

1 ’ b = 0.10
1 ’ ρb = 1 ’ 0.96 — 0.90 = 0.14

or about 2.5 and 3.4 in levels. If the dividend yield is one percentage point high, and is ex-
pected to be 0.9 percentage points high in one year, then prices must increase by P/D — 0.1
percentage points in the next year. The return gets the additional dividend. This, fundamen-
tally, is how zero forecastability of dividends implies that returns move more than one for
one with the dividend yield.
This is a little below the sample estimates in Table 1 and Table RR of 4-6. That is because
in the sample, a high price seems to forecast even lower dividend growth “ the wrong sign,
which is hard to believe. To continue with a calibration that consistently captures the facts
with no dividend forecastability, we either have to lower the persistence coef¬cient or lower
the return forecasting coef¬cient from the values reported in Table RR. A persistence b = 0.8
implies a return coef¬cient (1’ ρb) = (1’ 0.96— 0.8) = 0.23 or in levels 0.23 —25 = 5.75.
However, given the uncertainties of dividend/price forecastability, it seems more sensible to
continue calculations with b = 0.9 and corresponding return coef¬cient of 0.14,equal to the
estimate in the 48-98 sample.
Going in the other direction, statistical uncertainty, the recent runup in stocks despite low
dividend yields, and the dramatic portfolio implications of time-varying returns for investors
whose risks or risk aversion do not change over time all lead one to consider lower pre-
dictability. As we see from these calculations though, there are only two ways to make sense
of lower predictability. You could follow the “new economy” advocates, and believe that this
time, prices really are rising on advance news of dividend growth, even though prices have
not forecast dividend growth in the past. If not, you have to believe that dividend price ratios
are substantially more persistent than they have seemed in the postwar data.
Much more persistent d/p is a tough road to follow, since D/P ratios already move incred-
ibly slowly. Now, they basically change sign once a generation; high in the 50™s, low in the
60™s, high in the mid-70™s, and decreasing ever since (see Figure 37.) As a quantitative ex-
ample, suppose the D/P ratio had an AR(1) coef¬cient of 0.96 in annual data. This means a
half life of ln 0.5/ ln 0.96 = 17 years. In this case, the price coef¬cient would be coef¬cient


would be
1’b 1 ’ 0.96
= =1
D/P 0.04
and the return coef¬cient would be
1 ’ 0.962
1 ’ ρb
= ≈2
D/P 0.04
A one percentage point higher d/p ratio means that prices must rise 1 percentage point next
year, so returns must be about 2 percentage points higher. A two for one movement of
expected returns with the dividend yield thus seems about the lower bound for return pre-
dictability, so long as dividend growth remains unforecastable.
Persistence, price volatility and expected returns
From the dividend-price ratio equation (20.314) we can ¬nd the volatility of the dividend
price ratio and related it to the volatility and persistence of expected returns.
σ(dt ’ pt ) = σ (xt )
1 ’ ρb
With b = 0.9, 1/(1 ’ ρb) = 1/(1 ’ 0.96 — 0.9) = 7. 4. Thus, the high persistence of
expected returns means that a small expected return variation translates into a potentially
very large price variation; or equivalently that very large price variations, unaccounted for
by forecasts of dividend variation, can be explained by small variation in expected returns.
Translating to levels, a one percentage point change in expected returns with persistence
b = 0.9 corresponds to a 7.4% increase in price.
The Gordon growth model is a classic and even simpler way to see this point. With
constant dividend growth g and return r, the present value identity becomes
A price-dividend ratio of 25 means r ’ g = 0.04. Then, a one percentage point permanent
change in expected return translates into a 25 percentage point change in price! This is an
overstatement, since expected returns are not this persistent, but it allows you to clearly see
the point.
This point also shows that small market imperfections in expected returns can translate
into substantial market imperfections in prices, if those expected return changes are persis-
tent. We know markets cannot be perfectly ef¬cient (Grossman and Stiglitz 1980). If they
were perfectly ef¬cient, there would be no traders around to make them ef¬cient. Especially
in situations where short sales or arbitrage are constrained by market frictions, prices of sim-
ilar assets can be substantially different, while the expected returns of those assets are almost
the same. For example the “closed end fund” puzzle (Thompson 1978) noted that baskets
of securities sold for substantial price discounts relative to the sum of the individual securi-


ties. However, these price differentials persist for a long time. You can™t short the closed end
funds to buy the securities and keep that short position on for years.

20.1.4 Mean-reversion

I introduce long-horizon return regressions and variance ratios. I show that they are re-
lated: each one picks up a string of small negative return autocorrelations. I show though that
the direct evidence for mean reversion and Sharpe ratios that rise with horizon is weak.

Long run regressions and variance ratios
The ¬rst evidence of long-run forecastability in the stock market did not come from d/p
regressions, but rather from clever ways of looking at the long-run univariate properties of
returns. Fama and French (1988a) ran regressions of long-horizon returns on past long-
horizon returns,

rt’t+k = a + bk rt’k’t + µt+k ,

basically updating classic autocorrelation tests from the 60s to long horizon data. They found
negative and signi¬cant b coef¬cients: a string of good past returns forecasts bad future
Poterba and Summers (1988) considered a related “variance ratio” statistic. If stock re-
turns are i.i.d., then the variance of long horizon returns should grow with the horizon

var(rt’t+k ) = var(rt+1 + rt+2 + .. + rt+k ) = kvar(rt+1 ).

They computed the variance ratio statistic

1 var(rt’t+k )
vk = .
k var(rt+1 )

They found variance ratios below one. Stocks, it would seem, really are safer for “long-run
investors” who can “afford to wait out the ups and downs of the market,” common Wall Street
advice, long maligned by academics.
These two statistics are closely related, and reveal the same basic fact: stock returns have a
string of small negative autocorrelations. To see this relation, write the variance ratio statistic
³P ´
1 var j=1 rt+j
k k
X |k ’ j| X |k ’ j|
vk = = ρj = 1 + 2 ρj ,
k var(rt+1 ) k k


and the regression coef¬cient in (20.324)
« 
k k
cov  rt’j+1 
bk = rt+j ,
var(rt’t+k ) j=1 j=1
k k
k var(rt+1 ) X |k ’ j| 1 X |k ’ j|
= ρk+j = ρk+j .
var(rt’t+k ) k vk k
j=’k j=’k

Both statistics are based on tent-shaped sums of autocorrelations, as illustrated by Figure
39. If there are many small negative autocorrelations which bring returns back slowly after
a shock, these autocorrelations might be individually insigni¬cant. Their sum might be eco-
nomically and statistically signi¬cant, however, and these two statistics will reveal that fact
by focusing on the sum of autocorrelations. The long-horizon regression weights empha-
size the middle of the autocorrelation function, so a k year horizon long-horizon regression
is comparable to a somewhat longer variance ratio.

Variance ratio weights
Long horizon regression weights

Return autocorrelations

Figure 39. Long horizon regression and variance ratio weights on autocorrelations.

Moving average representation and mean reversion
The “mean-reversion” description of these statistics comes from their implications for
where values go at long horizons following a shock. We can show that the square root of
the variance ratio measures the long-horizon impact of a shock relative to its instantaneous
impact “ the extent to which values revert back towards their mean following a shock.
You can always write returns as a moving average of their own shocks. From a regression
of returns on past returns

a(L)rt = µt


you can ¬nd the θj in

θj µt’j = θ(L)µt = a(L)’1 µt .
rt =

(Most simply, just simulate (20.327) forward.) The θj are the moving average representation
or impulse-response function “ they tell you the path of expected returns following a shock.
Let vt represent the cumulative returns, or the log value of a dollar invested, ∆vt = rt .
Then, the partial sum k θj tells you the effect on invested wealth vt+k of a univariate
return shock µt
Relating variance ratios, long-horizon regressions and moving averages for ¬nite k is pos-
sible but not pretty. However, we can nicely relate the limiting response “ where limk’∞ Et vt+k
ends up after a shock “ to the autocorrelations, and thus to the limit of the variance ratio statis-
tic very simply as
« 2
∞ ∞
ρj =  θ j  /σ2 . (328)
1+2 µ
j=1 j=0

If returns are i.i.d., the variance ratio is one at all horizons; all autocorrelations are zero,
and all θ past the ¬rst are zero so the long-run price moves one for one with the shock.
A longP string of small negative autocorrelations means a variance ratio less than one, and
means ∞ θj < 1 so the long-run effect on price is lower than the impact effect - this is
The right hand equality of (20.328) follows by just taking the k ’ ∞ in (20.326). For the
second equality, you can recognize in both expressions the spectral density of r at frequency
zero. (Cochrane 1986 discusses these and other properties of variance ratios.)
Table A1 presents an estimate of the variance of long-horizon returns and long-horizon
return regressions. The long-horizon regressions do show some interesting mean reversion,
especially in the 3-5 year range. However, that turns around at year 7 and disappears by year
10. The variance ratios do show some long-horizon stabilization. At year 10, the variance
ratio is (16.3/19.8)2 = 0.68, and the long-run price impact of a shock is 16.8/19.8 = 0.85.
The mean log return grows linearly with horizon whether returns are autocorrelated or not
“ E(r1 + r2 ) = 2E(r). If the variance also grows linearly with the horizon, as it does for
non-autocorrelated returns, then the Sharpe ratio grows with the square root of horizon. If the
variance grows more slowly than horizon, then the Sharpe ratio grows faster than the square
root of the horizon. This is the fundamental question for whether stocks are (unconditionally)
“safer for the long run.” Table A1 includes the long-horizon Sharpe ratios, and you can see
that they do increase.


logs, 1926-1996. 1 2 3 5 7 10

19.8 20.6 19.7 18.2 16.5 16.3
σ (rk ) / k
0.08 -0.15 -0.22 -0.04 0.24 0.08
βk √
Sharpe/ k 0.31 0.30 0.30 0.31 0.36 0.39

Table A1. Mean reversion using logs, 1926-1996. r denotes the difference be-
tween the log value weighted NYSE return and the log treasury bill return. σ(rk ) =
σ(rt’t+k ) is the variance of long-horizon returns. β k is the regression coef¬cient
in rt’t+k = ± + β k rt’k’t + µt+k . The Sharpe ratio is E(rt’t+k )/σ(rt’t+k )

You would not be to blame if you thought that the evidence of Table A1 was rather weak,
especially compared with the dramatic dividend/price regressions. It is, and it is for this
reason that most current evidence for predictability focuses on other variables such as the d/p
In addition, Table A2 shows that the change from log returns to levels of returns, while
having a small effect on long-horizon regressions, destroys any evidence for higher Sharpe
ratios at long horizons. Table A3 shows the same results in the postwar period. Some of the
negative long-horizon regression coef¬cients are negative and signi¬cant, but there are just
as large positive coef¬cients, and no clear pattern. The variance ratios are ¬‚at or even rising
with horizons, and the Sharpe ratios are ¬‚at or even declining with horizon.
1926-1996 levels 1 2 3 5 7 10

20.6 22.3 22.5 24.9 28.9 39.5
σ (rk ) / k
0.02 -0.21 -0.22 -0.03 0.22 -0.63
βk √
Sharpe/ k 0.41 0.41 0.41 0.40 0.40 0.38

Table A2. r denotes the difference between the gross (not log) long-horizon value-
weighted NYSE return and the gross treasury bill return.

1947-1996 logs 1 2 3 5 7 10

15.6 14.9 13.0 13.9 15.0 15.6
σ (rk ) / k
-0.10 -0.29* 0.30* 0.30 0.17 -0.18
βk √
Sharpe/ k 0.44 0.46 0.51 0.46 0.41 0.36
1947-1996 levels 1 2 3 5 7 10

17.1 17.9 16.8 21.9 29.3 39.8
σ (rk ) / k
-0.13 -0.33* 0.30 0.25 0.13 -0.25
βk √
Sharpe/ k 0.50 0.51 0.55 0.48 0.41 0.37

Table A3. Mean-reversion in postwar data.

In sum, the direct evidence for mean-reversion in index returns seems quite weak. I
consider next whether indirect evidence, values of these statistics implied by other estimation


techniques, still indicate mean-reversion. (The mean-reversion of individual stock returns
as examined by Fama and French (1988a) is somewhat stronger, and results in the stronger
cross-sectional “reversal” effect described in section 2.5 below.)
Keep in mind also that the unconditional Sharpe ratio does not in the end, drive investment
decisions. Investment decisions are driven by the conditional moments of asset returns at any
moment in time, using every information variable that there is.

20.1.5 Mean-reversion and forecastability20.335

I reconcile large forecastability from d/p ratios with a small mean reversion. I calculate
the univariate return process implied by the simple VAR, and ¬nd that it displays little mean
I show that if dividend shocks are uncorrelated with expected return shocks, there must
be some mean reversion. If one rules out the small positive correlation in our samples, one
gets a slightly higher estimate of univariate mean-reversion.
I tie the strong negative correlation between return and d/p shocks to an essentially zero
correlation between expected return and dividend growth shocks.

How is it possible that variables such as the dividend price ratio forecast returns strongly,
but there seems to be little evidence for mean reversion in stock returns? To answer this
question, we have to connect the d/p regressions and the mean-reversion statistics.
Forecastability from variables such as the dividend-price ratios is related to, but does not
necessarily imply mean-reversion. (Campbell 1991 emphasizes this point.) Mean-reversion
is about the univariate properties of the return series, forecasts of rt+j based on {rt , rt’1 , rt’2 ...}.
Predictability is about the multivariate properties, forecasts of rt+j based on {xt , xt’1 , xt’2 , ...}
as well as {rt , rt’1 , rt’2 ...}. Variables xt can forecast rt+1 , while {rt’j } fail to forecast
rt+1 . As a simple example, suppose that returns are i.i.d., but you get to see tomorrow™s
newspaper. You forecast returns with a variable xt = rt+1 ,

rt+1 = xt
xt+1 = δ t+1 .

In this example, xt forecasts returns very well, but lagged returns do not forecast returns at
To examine this issue, continue with the VAR representation built up from a slowly mov-
ing expected return and unforecastable dividends, (20.311)-(20.317). We want to ¬nd the
univariate return process implied by this VAR: what would happen if you took in¬nite data
from the system and ran a regression of returns on lagged returns? The answer, derived below,


is of the form
1 ’ γL
rt = νt.
1 ’ bL

This is just the kind of process that can display slow mean-reversion or momentum. The
moving average coef¬cients are

rt = ν t ’ (γ ’ b)ν t’1 ’ b(γ ’ b)ν t’2 ’ b2 (γ ’ b)ν t’3 ’ b3 (γ ’ b)ν t’4 ’ ...

Thus, if γ > b, a positive return shock sets off a long string of small negative returns, which
cumulatively bring the value back towards where it started. If γ < b, a positive shock sets off
a string of small positive returns, which add “momentum” to the original increase in value.
The long-run statistics are
« 2
µ ¶2

X X 1’γ
ρj =  θj  /σ2 (ν t ) =
1+2 .

Thus, if γ > b, returns will have a variance ratio below one, and if γ < b a variance ratio
above one.
Now, what value of γ does our VAR predict? Is there a sensible structure of the VAR that
generates substantial predictability but little mean-reversion? The general formula, derived
below, is that γ solves

¡ ¢
1 + b2 σ2 (µd ) + (1 + ρ2 )σ2 (µdp ) ’ 2(ρ + b)σ(µd , µdp )
1 + γ2
= = 2q,
bσ2 (µd ) + ρσ 2 (µdp ) ’ (ρ + b)σ(µd , µdp )
and hence,
q 2 ’ 1.

Case 1: No predictability.
If returns are not predictable in this system; if σ(δ) = 0 so σ(µdp ) = 0; then (20.331)
specializes to

1 + γ2 1 + b2
= .
γ b
γ = b, so returns are not autocorrelated. Sensibly enough.
Case 2: Constant dividend growth.
Next, suppose that the case that dividend growth is constant; σ(µd ) = 0 and variation in
expected returns is the only reason that returns vary at all. In this case, (20.331) specializes


quickly to

1 + γ2 1 + ρ2
= ,
γ ρ
and thus γ = ρ.
This is a substantial amount of mean reversion. (γ ’ b) in (20.330) is then 0.96 ’ 0.90 =
0.06, so that each year j after a shock returns come back by 6 — bj percent of the original
shock. The cumulative impact is that value ends up at (1 ’ γ)/(1 ’ b) = (1 ’ 0.96)/(1 ’ 0.9)
= 0.4 or only 40% of the original shock.
Case 3: Dividend growth uncorrelated with expected return shocks.
Pure variation in expected returns is of course not realistic. Dividends do vary. If we add
dividend growth uncorrelated with expected return shocks “ with σ(µdp , µd ) = 0“ (20.331)
specializes to

1 + γ2 1 + b2 bσ2 (µd ) 1 + ρ2 ρσ2 (µdp )
= + = 2q
b bσ2 (µd ) + ρσ2 (µdp ) ρ bσ2 (µd ) + ρσ2 (µdp )
In this case, b < γ < ρ. There will be some mean reversion in returns “ this model cannot
generate γ ¤ b. However, the mean reversion in returns will be lower than with no dividend
growth, because dividend growth obscures the information in ex-post returns about time-
varying expected returns. (See (20.333).) How much lower depends on the parameters.
Using the parameters (20.321), I ¬nd that (20.332) implies
γ = q ’ q 2 ’ 1 = 0.928.

Our baseline VAR with no correlation between dividend growth and expected return
shocks thus generates a univariate return process that is slightly on the mean-reversion edge
of uncorrelated. The long-run response to a shock is
1’γ 1 ’ 0.928
= = 0.72
1’b 1 ’ 0.9
This is a lot less mean-reversion than 0.4, but still somewhat more mean reversion than we
see in Tables A1-A3.
This case is an important baseline worth stressing. If expected returns are positively
correlated, realized returns are negatively autocorrelated. If (unchanged) expected dividends
are discounted at a higher rate, today™s price falls. You can see this most easily by just looking
at the return or its linearization, (20.319)

rt+1 = ∆dt+1 ’ ρ(dt+1 ’ pt+1 ) + (dt ’ pt ).

The d ’ p ratio is proportional to expected returns. A positive shock to expected returns,
uncorrelated with dividend growth, lowers actual returns. A little more deeply, look at the


return innovation identity (20.308),
® 
∞ ∞
rt ’ Et’1 rt = (Et ’ Et’1 ) ° ρj rt+j » . (334)
ρj ∆dt+j ’
j=0 j=1
If expected returns (Et ’ Et’1 ) ∞ ρj rt+j increase, with no concurrent news about cur-
rent or future dividends, then rt ’ Et’1 rt decreases.
This is the point to remark on a curious feature of the return - dividend/price VAR; the
negative correlation between ex-post return shocks and dividend/price ratio shocks. All the
estimates were around -0.7. At ¬rst glance such a strong correlation between VAR residuals
seems strange. At second glance, it is expected. From (20.333) you can see that a positive
innovation to the dividend price ratio will correspond to a negative return innovation, unless a
striking dividend correlation gets in the way. More deeply, you can see the point in (20.334).
Quantitatively, from (20.315), the return shock is related to the dividend growth shock and
the expected return shock by
µr = µd ’ δ = µd ’ ρµdp
1 ’ ρb
Thus, a zero correlation between the underlying dividend growth and expected return shocks,
ρ(µd , δ) = 0 implies a negative covariance between return shocks and expected return shocks.
σ2 (δ)
σ(µr , δ) = ’
1 ’ ρb
The correlation is a perfect ’1 if there are no dividend growth shocks. At the parameters (??)
σ(µdp ) = 12.5, σ(µr ) = 15, we obtain
ρ σ(δ) σ(µdp ) 12.5
ρ(µr , δ) = ρ(µr , µdp ) = ’ = ’ρ = ’0.96 — = ’0.8.
1 ’ ρb σ(µ) σ(µ) 15
The slight 0.1 positive correlation between dividend growth and expected return shocks re-
sults (or, actually, results from) a slightly lower ’0.7 speci¬cation for the correlation of return
and d/p shocks.
The strong negative correlation between return shocks and expected return shocks, ex-
pected from a low correlation between dividend growth shocks and expected return shocks,
is crucial to the ¬nding that returns are not particularly correlated despite predictability. Con-
sider what would happen if the correlation ρ(µr , µdp ) = ρ(µr , δ) were zero. The expected
return xt is slow moving. If it is high now, it has been high for a while, and there has likely
been a series of good past returns. But it also will remain high for a while, leading to a pe-
riod of high future returns. This is “momentum,” positive return autocorrelation, the opposite
of mean-reversion.
Case 4: Dividend growth shocks positively correlated with expected return shocks
As we have seen, the VAR with no correlation between expected return and dividend


growth shocks cannot deliver uncorrelated returns or positive “momentum” correlation pat-
terns. At best, volatile dividend growth can obscure an underlying negative correlation pat-
tern. However, looking at (20.333) or (20.334), you can see that adding dividend growth
shocks positively correlated with expected return shocks could give us uncorrelated or posi-
tively correlated returns.
The estimate in Table RR implied a slight positive correlation of dividend growth and
expected return shocks, ρµd δ = 0.14 in (20.323). If we use that estimate in (20.331), we
recover an estimate
γ = 0.923; = 0.77

This γ is quite close to b = 0.9, and the small mean reversion is more closely consistent with
Tables A1-A3.
Recall that point estimates as in Table 1 actually showed that a high d/p ratio forecast
higher dividends “ the wrong sign. This point estimate means that shocks to the d/p ratio
and expected returns are positively correlated with shocks to expected dividend growth. If
you generalize the VAR to allow such shocks, along with a richer speci¬cation allowing
additional lags and variables, you ¬nd that VARs give point estimates with slight but very
small mean reversion. (See Cochrane 1994 for a plot. The estimated univariate process has
slight mean-reversion, with an impulse-response ending up at about 0.8 of its starting value,
and no different from the direct estimate.
Can we generate unforecastable returns in this system? To do so, we have to increase
¡ correlation between expected return shocks and dividend growth. Equating (20.331) to
the ¢
1 + b2 /b and solving for ρ(µd , µdp ), we obtain

(1 ’ ρb) (ρ ’ b) σ(µdp )
ρ(µd , µdp ) = = 0.51.
(1 ’ b)2 (ρ + b) σ(µd )

This is possible, but not likely. Any positive correlation between dividend growth and
expected return shocks strikes me as suspect. If anything, I would expect that since expected
returns rise in “bad times” when risk or risk aversion increases, we should see a positive shock
to expected returns associated with a negative shock to current or future dividend growth.
Similarly, if we are going to allow dividend price ratios to forecast dividend growth, a high
dividend price ratio should forecast lower dividends.
Tying together all these thoughts, I think it™s reasonable to impose zero dividend fore-
castability and zero correlation between dividend growth and expected return shocks. This
speci¬cation means that returns are really less forecastable than they seem in some samples.
As we have seen, b = 0.9 and no dividend forecastability means that the coef¬cient of return
on D/P is really about 3.4 rather than 5 or 6. This speci¬cation means that expected returns
really account for 100% rather than 130% of the price-dividend variance. However, it also
means that univariate mean reversion is slightly stronger than it seems in our sample.


This section started with the possibility that the implied mean reversion from a multivari-
ate system could be a lot larger than that revealed by direct estimates. Instead, we end up by
reconciling strong predictability and slight mean-reversion.
How to ¬nd the univariate return representation
To ¬nd the implied univariate representation, we have to ¬nd a representation
rt+1 = a(L)ν t

in which the a(L) is invertible. The Wold decomposition theorem tells us that there is a
unique moving invertible moving average representation in which the ν t are the one-step
ahead forecast error shocks, i.e. the errors in a regression model a(L)rt+1 = ν t+1 . Thus,
if you ¬nd any invertible moving average representation, you know you have the right one.
We can™t do it by simply manipulating the systems starting with (20.311), because they are
expressed in terms of multivariate shocks, errors in regressions that include x.
There are three fundamental representations of a time series: its Wold moving average
representation, its autocorrelation function, and its spectral density. To ¬nd the univariate
representation (20.335), you either calculate the autocorrelations E(rt rt’j ) from (20.311)
and then try to recognize what process has that autocorrelation pattern, or you calculate the
spectral density and try to recognize what process has that spectral density.
In our simple setup, we can write the return-d/p VAR (20.314)-(20.315) as

rt+1 = (1 ’ ρb) (dt ’ pt ) + (µdt+1 ’ ρµdpt+1 )
(dt+1 ’ pt ) = b(dt ’ pt ) + µdpt+1
Then, write returns as
(1 ’ ρb)
rt+1 = µdpt + (µdt+1 ’ ρµdpt+1 )
1 ’ bL
(1 ’ bL) rt+1 = (1 ’ ρb) µdpt + (µdt+1 ’ ρµdpt+1 ) ’ b (µdt ’ ρµdpt )
(1 ’ bL) rt+1 = (µdt+1 ’ ρµdpt+1 ) + (µdpt ’ bµdt )

Here, you can see that rt must follow an ARMA(1,1) with one root equal to b and the other
root to be determined. Write yt = (1 ’ bL)rt , and thus yt = (1 ’ γL)ν t . Then the
autocovariances of y from (20.336) are
¡ ¢
E(yt+1 ) = 1 + b2 σ2 (µd ) + (1 + ρ2 )σ2 (µdp ) ’ 2(ρ + b)σ(µd , µdp )

E(yt+1 yt ) = ’bσ 2 (µd ) ’ ρσ 2 (µdp ) ’ (ρ + b)σ(µd , µdp )
while yt = (1 ’ γL)ν t implies
¡ ¢
1 + γ 2 σ2
E(yt+1 ) = ν
E(yt+1 yt ) = ’γσ2 .


Hence, we can ¬nd γ from the condition
¡ ¢
1 + b2 σ2 (µd ) + (1 + ρ2 )σ2 (µdp ) ’ 2(ρ + b)σ(µd , µdp )
1 + γ2
= = 2q.
bσ2 (µd ) + ρσ2 (µdp ) ’ (ρ + b)σ(µd , µdp )

The solution (the root less than one) is
q 2 ’ 1.

For more general processes, such as computations from an estimated VAR, it is better to
approach the problem via the spectral density. This approach allows you to construct the uni-
£ ¤0
variate representation directly without relying on cleverness. If you write yt = rt xt ,
the VAR is yt = A(L)· t . Then spectral density of returns Sr (z) is given by the top left ele-
ment of Sy (z) = A(z)E(··0 )A(z ’1 )0 with z = e’iω . Like the autocorrelation, the spectral
density is the same object whether it comes from the univariate or multivariate representation.
You can ¬nd the autocorrelations by (numerically) inverse-Fourier transforming the spectral
density. The autocorrelations and spectral densities are directly revealing: a string of small
negative autocorrelations or a dip in the spectral density near frequency zero correspond to
mean-reversion; positive autocorrelations or a spectral density higher at frequency zero than
elsewhere corresponds to momentum.
To ¬nd the univariate, invertible moving average representation from the spectral density,
you have to factor the spectral density Srr (z) = a(z)a(z) where a(z) is a polynomial with
roots outside the unit circle, a(z) = (1 ’ γ 1 z)(1 ’ γ 2 z)...γ i < 1. Then, since a(L) is
invertible, rt = a(L)µt σ2 = 1 is the univariate representation of the return process.

20.1.6 Multivariate mean-reversion

I calculate the responses to multivariate rather than univariate shocks. In a multivari-
ate system you can isolate expected return shocks and dividend growth shocks. The price
response to expected return shocks is entirely stationary.

We are left with a troubling set of facts: high price/dividend ratios strongly forecast low
returns, yet high past returns do not seem to forecast low subsequent returns. Surely, there
must be some sense in which “high prices” forecast lower subsequent returns?
The resolution must involve dividends (or earnings, book value, or a similar divisor for
prices). A price rise with no change in dividends results in lower subsequent returns. A price
rise that comes with a dividend rise does not result in lower subsequent returns. A high return
combines dividend news and price-dividend news, and so obscures the lower expected return
message. In a more time-series language, instead of looking at the response to a univariate
return shock “ a return that was unanticipated based on lagged returns “ let us look at the


responses to multivariate shocks “ a return that was unanticipated based on lagged returns
and dividends.
This is easy to do in our simple VAR. We can simulate (20.314) -(20.317) forward and
trace the responses to a dividend growth shock and an expected return (d/p ratio) shock.
Figures 40 and 41 present the results of this calculation. (Cochrane 1994 presents a corre-
sponding calculation using an unrestricted VAR, and the results are very similar.)

Figure 40. Responses to a one standard deviation (1.7%) negative expected return shock
in the simple VAR.

Start with Figure 40. The negative expected return shock raises prices and the p-d ratio
immediately. We can identify such a shock in the data as a return shock with no contempo-
raneous movement in dividends. The p-d ratio then reverts to its mean. Dividends are not
forecastable, so they show no immediate or eventual response to the expected return shock.
It could be the case that prices move in advance of future dividends; if this were the case we
would see dividends rising to meet higher prices after a return shock. Instead, prices show a
long and complete reversion back to the level of dividends. This shock looks a lot like a neg-
ative yield shock to bonds: such a shock raises prices now so that bonds end up at the same
maturity value despite a smaller expected return.
The cumulative return “mean-reverts” even more than prices. For given prices, dividends
are now smaller (smaller d-p) so returns deviate from their mean by more than price growth.


Figure 41. Responses to a one standard deviation (14%) dividend growth shock in the
simple VAR.

The cumulative return ends up below its previously expected value. Compare this value
response to the univariate value response, which we calculated above ends up at about 0.8 of
its time-1 response.
The dividend shock raises prices and cumulative returns immediately and proportionally
to dividends, so the price-dividend ratio does not change. Expected returns or the discount
rate, re¬‚ected in any slope of the value line, do not change. If the world were i.i.d., this is the
only kind of shock we would see, and dividend-price ratios would always be constant.
Figure (40) and (41) plot the responses to “typical,” one standard deviation shocks. Thus
you can see that actual returns are typically about half dividend shocks and half expected
return shocks. That is why returns alone are a poor indicator of expected returns.
In sum, at last we can see some rather dramatic “mean-reversion.” Good past returns
by themselves are not a reliable signal of lower subsequent returns, because they contain
substantial dividend growth noise. Good returns that do not include good dividends isolate
an expected return shock. This does signal low subsequent returns. It sets off a completely
transitory variation in prices.


20.1.7 Cointegration and short vs. long-run volatility

If d ’ p, ∆p and ∆d then the long-run variance of ∆d and ∆p must be the same, long-
run movements in d and p must be perfectly correlated, and d and p must end up in the
same place after any shock. Thus, the patterns of predictability, volatility, univariate and
multivariate mean-reversion really all just stem from these facts, the persistence of d ’ p and
the near-unforecastability of ∆d.

You might think that the facts about predictability depend on the exact structure of the
VAR, including parameter estimates. In fact, most of what we have learned about predictabil-
ity and mean reversion comes down to a few facts: the dividend-price ratio, returns, and div-
idend growth are all stationary; dividend growth is not (or at best weakly) forecastable, and
dividend growth varies less than returns.
These facts imply that the dividend and price responses to each shock are eventually equal
in Figures (40) and (41). If d ’ p, ∆p and ∆d are stationary, then d and p must end up in the
same place following a shock. The responses of a stationary variable (d ’ p) must die out.
If dividends are not forecastable, then it must be the case that prices do all the adjustment
following a price shock that does not affect dividends.
Stationary d ’ p, ∆p and ∆d also implies that the variance of long-horizon ∆p must
equal the variance of long-horizon ∆d.
1 1
lim var (pt+k ’ pt ) = lim var (dt+k ’ dt ) ,
k’∞ k k’∞ k

and the correlation of long-run price and dividend growth must approach one. These facts
follow from the fact that the variance ratio of a stationary variable must approach zero, and
d ’ p is stationary. Intuitively, long run price growth cannot be more volatile than long run
dividend growth, or the long-run p ’ d ratio would not be stationary.
Now, if dividend growth is not forecastable, its long run volatility is the same as its short
run volatility “ its variance ratio is one. Short run price growth is more volatile than short
run dividend growth, so we conclude that prices must be mean-reverting; their variance ratio
must be below one.
Quantitatively, this observation supports the magnitude of univariate mean reversion that
we have found so far. Dividend growth has a short run, and thus long-run, standard deviation
of about 10% per year, while returns and prices Thus, prices must have a long-run variance
ratio of about 2/3, or a long-run response to univariate shocks of 2/3 = 0.82.
The change in prices is not the same thing as the return, especially at long horizons,
since returns include the intervening dividends. One can address this question with a slightly
different accounting: de¬ne d as the dividend paid to a dollar investment. The resulting
dividend series is still not predictable and has roughly the same volatility, so in this case we


get approximately the same result.
The work of Lettau and Ludvigson (2000) suggests that we may get much more dramatic
implications by including consumption data. The ratio of stock market values to consumption
should also be stationary; if wealth were to explode people would surely consume more
and vice versa. The ratio of dividends to aggregate consumption should also be stationary.
Consumption growth seems independent at all horizons, and consumption growth is very
stable, with roughly 1% annual standard deviation. For example, Lettau and Ludvigson 2000
¬nd that none of the variables that forecast returns in Table LL “ including d ’ p and a
consumption to wealth ratio “ forecast consumption growth at any horizon.
These facts suggest that aggregate dividends are forecastable by the consumption/dividend
ratio, and strongly so “ the long-run volatility of aggregate dividend growth must be the 1%
volatility of consumption growth, not the 10% short run volatility of dividend growth.
These facts also mean that almost all of the 15% or more variation in annual stock market
wealth must be transitory “ the long run volatility of stock market value must be no more
than the 1% consumption growth volatility!
Again, total market value is not the same thing as price, price is not the same thing as
cumulated return, and aggregate dividends are not the same thing as the dividend concept
we have used so far (dividends paid to a dollar investment with dividends consumed), or
dividends paid to a dollar investment with dividends reinvested. Lettau and Ludvigson show
that the consumption/wealth ratio does forecast returns, but noone has yet worked out the
mean-reversion implications of this fact.
My statements about the implications of stationary d ’ p, ∆d, ∆p, r are developed in de-
tail in Cochrane 1994. They are special cases of the representation theorems for cointegrated
variables developed by Engel and Granger (1987). A regression of a difference like ∆p on
a ratio like p ’ d is called the error-correction representation of a cointegrated system. Er-
ror correction regressions have subtly and dramatically changed almost all empirical work
in ¬nance and macroeconomics. The vast majority of the successful return forecasting re-
gressions in this section, both time-series and cross-section, are error-correction regressions
of one sort or another. Corporate ¬nance is being redone with regressions of growth rates
on ratios, as is macroeconomic forecasting. For example, the consumption/GDP ratio is a
powerful forecaster of GDP growth.

20.1.8 Bonds

The expectations model of the term structure works well on average and for horizons of 4
years or greater. At the one year horizon, however, a forward rate 1 percentage point higher
than the spot rate seems entirely to indicate a one percentage point higher expected excess
return rather than a one percentage point rise in future interest rates.


The venerable expectations model of the term structure speci¬es that long term bond
yields are equal to the average of expected future short term bond yields. As with the CAPM
and random walk, the expectations model was the workhorse of empirical ¬nance for a gen-
eration. And as with those other views, a new round of research has signi¬cantly modi¬ed
the traditional view.

Maturity Avg. Return Std. Std. dev.
(N) (N)
N E(hprt+1 ) σ(hprt+1 )
1 5.83 0.42 2.83
2 6.15 0.54 3.65
3 6.40 0.69 4.66
4 6.40 0.85 5.71
5 6.36 0.98 6.58

Table 4. Average continuously compounded (log) one-year holding period returns
on zero-coupon bonds of varying maturity. Annual data from CRSP 1953-1997.

Table 4 calculates the average return on bonds of different maturities. The expectations
hypothesis seems to do pretty well. Average holding period returns do not seem very different
across bond maturities, despite the increasing standard deviation of bond returns as maturity
rises. The small increase in returns for long term bonds, equivalent to a slight average upward
slope in the yield curve, is usually excused as a small “liquidity premium.” In fact, the curious
pattern in Table 4 is that bonds do not share the high Sharpe ratios of stocks. Whatever factors
account for the volatility of bond returns, they seem to have very small risk prices.
Table 4 is again a tip of an iceberg of an illustrious career for the expectations hypothesis.
Especially in times of great in¬‚ation and exchange rate instability, the expectations hypothesis
does a very good ¬rst-order job.
However, one can ask a more subtle question. Perhaps there are times when long term
bonds can be forecast to do better, and other times when short term bonds are expected to
do better. If the times even out, the unconditional averages in Table 4 will show no pattern.
Equivalently, we might want to check whether a forward rate that is unusually high forecasts
an unusual increase in spot rates.


Change in yields Holding period returns
(1) (1) (N+1) (1)
yt+N ’ yt = hprt+1 ’ yt =
(N’N+1) (1) (N’N+1) (1)
= a + b(ft ’ yt ) + µt+N = a + b(ft ’ yt ) + µt+1
2 2
N a σ(a) b σ(b) R a σ(a) b σ(b) R
1 0.1 0.3 -0.10 0.36 -0.02 -0.1 0.3 1.10 0.36 0.16
2 -0.01 0.4 0.37 0.33 0.005 -0.5 0.5 1.46 0.44 0.19
3 -0.04 0.5 0.41 0.33 0.013 -0.4 0.8 1.30 0.54 0.10
4 -0.3 0.5 0.77 0.31 0.11 -0.5 1.0 1.31 0.63 0.07

Table 5. Forecasts based on forward-spot spread. OLS regressions 1953-1997 an-
nual data. Yields and returns in annual percentages. The left hand panel runs the
change in the one year yield on the forward-spot spread. The right hand panel runs
the one period excess return on the forward-spot spread.

Table 5 gets at these issues, updating Fama and Bliss™ (1986) classic regression tests.
(Campbell and Shiller 1991 and Campbell 1995 make the same point with regressions of
yield changes on yield spreads.) The left hand panel presents a regression of the change in
yields on the forward-spot spread. The expectations hypothesis predicts a coef¬cient of 1.0,
since the forward rate should equal the expected future spot rate. At a one-year horizon we
see instead coef¬cients near zero and a negative adjusted R2 . Forward rates one year out
seem to have no predictive power whatsoever for changes in the spot rate one year from now.
On the other hand, by 4 years out, we see coef¬cients within one standard error of 1.0. Thus,
the expectations hypothesis seems to do poorly at short (1 year) horizons, but much better at
longer horizons and on average (Table 4).
If the yield expression of the expectations hypothesis does not work at one year horizons,
then the expected return expression of the expectations hypothesis must not hold either “ one
must be able to forecast one year bond returns. To check this fact, the right hand panel of
Table 5 runs regressions of the one year excess return on long-term bonds on the forward-spot
spread. Here, the expectations hypothesis predicts a coef¬cient of zero: no signal (including
the forward-spot spread) should be able to tell you that this is a particularly good time for
long bonds vs. short bonds. As you can see, the coef¬cients in the right hand panel of Table
5 are all about 1.0. A high forward rate does not indicate that interest rates will be higher one
year from now; it seems entirely to indicate that you will earn that much more holding long
term bonds (The right hand panel is really not independent evidence, since the coef¬cients in
the right and left hand panels of Table 5 are mechanically linked. For example 1.14 + (-0.14)
= 1.0, and this holds as an accounting identity. Fama and Bliss call them “complementary
Figures 42 and 43 provide a pictorial version of the results in Table 5. Suppose that the
yield curve is upward sloping as in the left panel. What does this mean? A naive investor
might think this pattern indicates that long-term bonds give a higher return than short term
bonds. The expectations hypothesis denies this conclusion. If the expectations hypothesis
were true, the forward rates plotted against maturity in the left hand panel would translate


one-for-one to the forecast of future spot rates in the right hand panel, as plotted in the line
marked “Expectations model.” Rises in future short rates should lower bond prices, cutting
off the one-period advantage of long-term bonds. The rising short rates would directly raise
the multi-year advantage of short term bonds.
We can calculate the actual forecast of future spot rates from the estimates in the left hand
panel of Table 5, and these are given by the line market “Estimates” in Figure 43. The essence
of the phenomenon is sluggish adjustment of the short rates. The short rates do eventually
rise to meet the forward rate forecasts, but not as quickly as the forward rates predict that
they should.

Figure 42. If the current yield curve is as plotted here....

As dividend growth should be forecastable so that returns are not forecastable, short-term
yields should be forecastable so that returns are not forecastable. In fact, yield changes are
almost unforecastable at a one year horizon, so, mechanically, bond returns are. We see this
directly in the ¬rst row of the left hand panel of Table 5 for the one-period yield. It is an
implication of the right hand panel as well. If

(N+1) (1) (N’N+1) (1)
hprt+1 ’ yt = 0 + 1(ft ’ yt ) + µt+1


Figure 43. ...this is the forecast of future one year interest rates. The dashed line gives the
forecast from the expectations hypothesis. The solid line is constructed from the estimates in
Table 4.

then, writing out the de¬nition of holding period return and forward rate,

(N) (N+1) (1) (N) (N+1) (1)
pt+1 ’ pt + pt = 0 + 1(pt ’ pt + pt ) + µt+1
(N) (N)
pt+1 = 0 + 1(pt ) + µt+1
(N) (N)
yt+1 = 0 + 1(yt ) ’ µt+1 /N

A coef¬cient of 1.0 in (20.338) is equivalent to yields or bond prices that follow random
walks; yield changes that are completely unpredictable.
Of course yields are stationary and not totally unpredictable. However, they move slowly.
Thus, yield changes are very unpredictable at short horizons but much more predictable at
long horizons. That is why the coef¬cients in the right hand panel of Table 5 build with
horizon. If we did holding period return regressions at longer horizons, they would gradually
approach the expectations hypothesis result.
The roughly 1.0 coef¬cients in the right hand panel of Table 5 mean that a one percentage
point increase in forward rate translates into a one percentage point increase in expected
return. It seems that old fallacy of confusing bond yields with their expected returns also
contains a grain of truth, at least for the ¬rst year. However, the one-for-one variation of
expected returns with forward rates does not imply a one-for-one variation of expected returns


with yield spreads. Forward rates are related to the slope of the yield curve,
(N’N+1) (1) (N) (N+1) (1)
ft ’ yt = pt ’ pt ’ yt
(N) (N+1) (1)
= ’N yt + (N + 1)yt ’ yt
³ ´³ ´
(N+1) (N) (N+1) (1)
= N yt ’ yt + yt ’ yt

Thus, the forward-spot spread varies a more than the yield spread, so regression coef¬cients
of holding period yields on yield spreads give coef¬cients greater than one. Expected returns
move more than one-for-one with yield spreads. Campbell (1995) reports coef¬cients of
excess returns on yield spreads that rise from one at a 2 month horizon to 5 at a 5 year
The facts are analogous to the dividend/price regression. There, dividends were essen-
tially unforecastable. This implied that a one percentage point change in dividend yield
implied a 5 percentage point change in expected excess returns.
Of course, there is risk: the R2 are all about 0.1-0.2, about the same values as the R2
from the dividend/price regression at a one year horizon, so this strategy will often go wrong.
Still, 0.1-0.2 is not zero, so the strategy does pay off more often than not, in violation of the
expectations hypothesis. Furthermore, the forward-spot spread is a slow moving variable,
typically reversing sign once per business cycle. Thus, the R2 build with horizon as with the
D/P regression, peaking in the 30% range (Fama and French 1989).
The fact that the regressions in Table 5 run the change in yield on the forward-spot spread
and the excess return on the forward-spot spread is very important. The overall level of
interest rates moves up and down a great deal but slowly over time. Thus, if you run yt+j =
+ µt+N , you will get a coef¬cient b almost exactly equal to 1.0 and a stupendous
a + bft
R , seemingly a stunning validation of the expectations hypothesis. If you run a regression

of tomorrow™s temperature in Chicago on today™s temperature, the regression coef¬cient will
be near 1.0 with a huge R2 as well, since the temperature varies a lot over the year. But
today™s temperature is not a useful temperature forecast. To measure a temperature forecast
we want to know if the forecast can predict the change in temperature. Is (forecast - today™s
temperature) a good measure of (tomorrow™s temperature - today™s temperature)? Table 5
runs this regression.
The decomposition in (20.339) warns us of one of several econometric traps in this kind
of regression. Notice that two of the three right hand variables are the same. Thus any mea-
(N+1) (1)
surement error in pt and pt will induce a spurious common movement in left and right
hand variables. In addition, since the variables are a triple difference, the difference may
eliminate a common signal and isolate measurement error or noise. There are pure mea-
surement errors in the bond data, and we seldom observe pure discount bonds of the exactly
desired maturity. In addition, various liquidity and microstructure effects can in¬‚uence the
yields of particular bonds in ways that are not exploitable for typical investors.
As an example of what this sort of “measurement error” can do, suppose all bond yields


are 5%, but there is one “error” in the two period bond price at time 1 “ rather than being
-10 it is -15. The table below tracks the effects of this error. It implies a blip of the one year
forward rate in year one, and then a blip in the return from holding this bond from year one
to year two. The price and forward rate “error” automatically turns in to a subsequent return
when the “error” is corrected. If the price is real, of course, this is just the kind of event we
want the regression to tell us about “ the forward rate did not correspond to a change in future
spot rate, so there was a large return; it was a price that was “out of line” and if you could
trade on it, you should. But the regression will also pounce on measurement error in prices
and indicate spurious returns.

0 1 2 3
-5 -5 -5 -5
-10 -10 -10
-15 -15 -15 -15
5 5 5 5
yt , i 6= 2
5 7.5 5 5
5 10 5 5
(1’2) (1)
0 5 0 0
ft ’ yt
(2’1) (1)
0 0 5 0
hprt ’ yt
Numerical example of the effect of measurement error in yields on yield regressions.

20.1.9 Foreign exchange

The expectations model works well on average. However, a foreign interest rate one
percentage point higher than its usual differential with the US rate (equivalently, a one per-
centage point higher forward-spot spread) seems to indicate even more than one percentage
point expected excess return; a further appreciation of the foreign currency.

Suppose interest rates are higher in Germany than in the U.S. Does this mean that one can
earn more money by investing in German bonds? There are several reasons that the answer
might be no. First, of course is default risk. While not a big problem for German government
bonds, Russia and other governments have defaulted on bonds in the past and may do so
again. Second, and more important, is the risk of devaluation. If German interest rates are
10%, US interest rates are 5%, but the Euro falls 5% relative to the dollar during the year,
you make no more money holding the German bonds despite their attractive interest rate.
Since lots of investors are making this calculation, it is natural to conclude that an interest
rate differential across countries on bonds of similar credit risk should reveal an expectation
of currency devaluation. The logic is exactly the same as the “expectations hypothesis” in


the term structure. Initially attractive yield or interest rate differentials should be met by an
offsetting event so that you make no more money on average in one country or another, or in
one currency versus another. As with bonds, the expectations hypothesis is slightly different
from pure risk neutrality since the expectation of the log is not the log of the expectation.
Again, the size of the phenomena we study usually swamps this distinction.
As with the expectations hypothesis in the term structure, the expected depreciation view
ruled for many years, and still constitutes an important ¬rst-order understanding of interest
rate differentials and exchange rates. For example, interest rates in east Asian currencies
were very high on the eve of the currency collapses of 1997, and many banks were making
tidy sums borrowing at 5% in dollars to lend at 20% in local currencies. This situation
should lead one to suspect that traders expect a 15% devaluation, and most likely a small
chance of a larger devaluation. That is, in this case, exactly what happened. Many observers
and policy analysts who ought to know better often attribute high nominal interest rates in
troubled countries to “tight monetary policy” that is “strangling the economy” to “defend the
currency.” In fact, one™s ¬rst order guess should be that such high nominal rates re¬‚ect a
large probability of in¬‚ation and devaluation “ loose monetary and ¬scal policy “ and that
they correspond to much lower real rates.


. 14
( 17)