. 13
( 17)


Since this must hold for every r, we will need

A(0) = 0; B(0) = 0.

Given the guess (19.272), the derivatives that appear in (19.271) are
1 ‚Pr
= ’B(N )
P ‚r
1 ‚2P
= B(N )2
P ‚r2
1 ‚P
= A0 (N) ’ B 0 (N )r.
P ‚N
Substituting these derivatives in (19.271),
’B(N )φ(¯ ’ r) + B(N)2 σ2 ’ A0 (N ) + B 0 (N )r ’ r = ’B(N )σr σΛ .
r r
This equation has to hold for every r, so the terms multiplying r and the constant terms must
separately be zero.
A0 (N) = B(N )2 σ 2 ’ (φ¯ ’ σ r σΛ ) B(N )
B 0 (N) = 1 ’ B(N)φ.

We can solve this pair of ordinary differential equations by simple integration. The second


one is
= 1 ’ φB
= dN
1 ’ φB
’ ln (1 ’ φB) = N

and hence
1¡ ¢
1 ’ e’φN . (276)
B(N ) =

Note B(0) = 0 so we did not need a constant in the integration.
We solve the ¬rst equation in (19.275) by simply integrating it, and choosing the constant
to set A(0) = 0. Here we go.

A0 (N ) = B(N )2 σ 2 ’ (φ¯ ’ σ r σ Λ ) B(N )
B(N)2 dN ’ (φ¯ ’ σ r σΛ ) B(N )dN + C
A(N ) = r
µ ¶Z
¡ ¢ ¡ ¢
σ2 σ r σΛ
r ’φN ’2φN
1 ’ e’φN dN + C
A(N ) = 1 ’ 2e +e dN ’ r ’¯
2φ2 φ
µ ¶µ ¶µ ¶
σ2 2e’φN e’2φN e’φN
σr σ Λ
A(N ) = N+ ’ ’ r’
¯ N+ +C
2φ2 φ 2φ φ φ

We pick the constant of integration to give A(0) = 0. You can do this explicitly, or ¬gure out
directly that the result is achieved by subtracting one from the e’φN terms,
à ¢! µ ¶Ã ¢!
¡ ’φN ¢ ¡ ’2φN ¡ ’φN
2 2e ’1 e ’1 e ’1
σ σr σΛρ
A(N ) = r2 N + ’ ’ r’¯ N+
φ 2φ φ φ

Now, we just have to make it pretty. I™m aiming for the form given in (19.274). Note

1¡ ¢
B(N)2 1 ’ 2e’φN + e’2φN
1 ’ e’φN e’2φN ’ 1
φB(N)2 = 2 +
φ φ
e ’1
φB(N )2 ’ 2B(N ) =


µ ¶µ ¶
σ2 φ σ r σΛ
r 2
A(N ) = 2 N ’ 2B(N ) ’ 2 B(N ) + B(N ) ’ r ’
¯ (N ’ B(N ))

µ ¶
σ2 σ2
σr σΛ
A(N ) = ’ r B(N)2 ’ r ’ ’ r2 (N ’ B(N )) .
4φ φ 2φ

We™re done.

19.5.2 Vasicek model by expectation

What if we solve the discount rate forward and take an expectation instead? The Vasicek
model is simple enough that we can follow this approach as well, and get the same analytic
solution. The same methods work for the other models, but the algebra gets steadily worse.
The model is

= ’rdt ’ σΛ dz
dr = φ(¯ ’ r)dt + σr dz.

The bond price is
µ ¶
P0 = E0

I use 0 and N rather than t and t + N to save a little bit on notation.
To ¬nd the expectation in (19.279), we have to solve the system (19.277)-(19.278) for-
ward. The steps are simple, though the algebra is a bit daunting. First, we solve r forward.
Then, we solve Λ forward. ln Λt turns out to be conditionally normal, so the expectation in
(19.279) is the expectation of a lognormal. Collecting terms in the resulting expectation that
depend on r0 as the B(N ) term, and the constant term as the A(N) term, we ¬nd the same
solution as (19.273)-(19.274).
The interest rate is just an AR(1). By analogy with a discrete time AR(1) you can guess
that its solution is
Z t
e’φ(t’s) σr dzs + e’φt r0 + (1 ’ e’φt )¯.
rt = r

To derive this solution, de¬ne r by

rt = eφt (rt ’ r).
˜ ¯



φ˜t dt + eφt drt
r = r
φ˜t dt + eφt φ(¯ ’ r)dt + eφt σ r dzt
r = r r
φ˜t dt ’ eφt φe’φt rt dt + eφt σr dzt
r = r ˜
eφt σr dzt .
r =

This equation is easy to solve,
Z t
eφs dzs
rt ’ r0
˜˜ = σr
Z t
eφt (rt ’ r) ’ (r0 ’ r) = σr eφs dzs
¯ ¯
Z t
e’φ(t’s) dzs .
rt ’ r = e
¯ (r0 ’ r) + σ r

And we have (19.280).
Now, we solve the discount factor process forward. It isn™t pretty, but it is straightforward.

dΛ 1 dΛ2 1
= ’(rt + σ 2 )dt ’ σ Λ dzt
d ln Λt = ’

Λ 2Λ
Zt Zt
ln Λt ’ ln Λ0 =’ (rs + σΛ )ds ’ σΛ dzs .
s=0 s=0

Plugging in the interest rate solution (19.280),
Z t ·µZ s ¶ ¸ Zt
’φ(s’u) ’φs
ln Λt ’ ln Λ0 = ’ e σr dzu + e (r0 ’ r) + r + σ Λ ds ’ σΛ
¯ ¯ dzs
s=0 u=0 s=0

Interchanging the order of the ¬rst integral, evaluating the easy ds integrals and rearranging,
Z t ·Z t ¸ ·µ ¶ ¸
Zt Zt
’φ(s’u) ’φs
= ’σΛ dzs ’ σr e ds dzu ’ r + σΛ t + (r0 ’ r)
¯ ¯ e ds
s=0 u=0 s=u s=0
Zt · ´¸ µ ¶
σr ³ 1 ’ e’φt
=’ σΛ + 1’e dzu ’ r + σ Λ t ’ (r0 ’ r)
¯ ¯ .
φ 2 φ

The ¬rst integral has a deterministic function of time u. This gives rise to a normally dis-
tributed random variable “ it™s just a weighted sum of independent normals dzu :
µ Zt ¶
f 2 (u)du .
f(u)dzu ∼ N 0,
u=0 u=0


Thus, ln Λt ’ ln Λ0 is normally distributed with mean given by the second set of terms in
(19.281) and variance

var0 (ln Λt ’ ln Λ0 ) =

· ´¸2
σr ³
1 ’ e’φ(t’u)
= σΛ + du
Z t "µ #
¶2 µ ¶
σ 2 ’2φ(t’u)
σr σr σr r
e’φ(t’u) + 2 e
= σΛ + ’2 σΛ + du
φ φ φ φ
µ ¶2 µ ¶
σr ¡ ¢ σ2 ¡ ¢
σr σr
1 ’ e’φt + r3 1 ’ e’2φt . (19.282)
= σΛ + t ’ 2 2 σΛ +
φ φ
φ 2φ

Since we have the distribution of ΛN we are ready to take the expectation.

¡ ¢ 1
ln P (N, 0) = ln E0 eln ΛN ’ln Λ0 = E0 (ln ΛN ’ ln Λ0 ) + σ 2 (ln ΛN ’ ln Λ0 ) .

Plugging in the mean from (19.281) and the variance from (19.282)

·µ ¶ ¸
1 ’ e’φN
ln P0 = ’ r + σΛ N + (r0 ’ r)
¯ ¯
2 φ
µ ¶2 µ ¶
¡ ¢ σ2 ¡ ¢
1 σr σr σr
+ r3 1 ’ e(19.284)
’φN ’2φN
+ + σΛ N ’ 2 + σΛ 1 ’ e
2φ φ
φ 4φ

All that remains is to make it pretty. To compare it with our previous result, we want to
express it in the form ln P (N, r0 ) = A(N ) ’ B(N )r0 . The coef¬cient on r0 (19.283) is

1 ’ e’φN
B(N) = ,

the same expression we derived from the partial differential equation.
To simplify the constant term, recall that (19.285) implies

1 ’ e’2φN
= ’φB(N )2 + 2B(N ).


Thus, the constant term (the terms that do not multiply r0 ) in (19.283) is
·µ ¶ ¸
1 ’ e’φN
A(N ) = ’ r + σΛ N ’ r
¯ ¯
2 φ
µ ¶2 µ ¶
¡ ¢ σ2 ¡ ¢
1 σr σr σr
+ σ Λ 1 ’ e’φN + r3 1 ’ e’2φN
+ + σΛ N ’ 2
2φ φ
φ 4φ
·µ ¶ ¸
A(N ) = ’ r + σ2 N ’ rB(N )
¯ ¯

µ ¶2 µ ¶
σ2 ¡ ¢
1 σr σr σr
+ σ Λ B(N) ’ r2 φB(N )2 ’ 2B(N)
+ + σΛ N ’
2φ φ φ 4φ
µ2 ¶
1 σr σr
’ r (N ’ B(N )) ’ r2 φB(N )2 .
A(N ) = + σΛ ¯
2 φ2 φ 4φ
Again, this is the same expression we derived from the partial differential equation.
This integration is usually expressed under the risk-neutral measure. If we write the risk-
neutral process

= ’rdt
dr = [φ(¯ ’ r) ’ σ r σ Λ ] dt + σr dz.

Then the bond price is
= Ee’ rs ds
P0 .

The result is the same, of course.

19.5.3 Cox Ingersoll Ross Model

For the Cox-Ingersoll-Ross (1985) model

= ’rdt ’ σΛ rdz
Λ √
dr = φ(¯ ’ r)dt + σr rdz

our differential equation (19.269) becomes

1 ‚ 2P 2
‚P ‚P ‚P
φ(¯ ’ r) +
r σr r ’ ’ rP = σr σ Λ r.
2 ‚r2
‚r ‚N ‚r
Guess again that log prices are a linear function of the short rate,

P (N, r) = eA(N)’B(N)r .


Substituting the derivatives of (19.287) into (19.286),
’B(N)φ(¯ ’ r) + B(N )2 σ 2 r ’ A0 (N ) + B 0 (N )r ’ r = ’B(N )σ r σ Λ r.
r r
Again, the coef¬cients on the constant and on the terms in r must separately be zero,
B 0 (N ) = 1 ’ σ2 B(N )2 ’ (σ r σ Λ + φ) B(N )
A0 (N ) = ’B(N)φ¯.r

The ordinary differential equations (19.288) are quite similar to the Vasicek case, (19.275).
However, now the variance terms multiply an r, so the B(N ) differential equation has the
extra B(N)2 term. We can still solve both differential equations, though the algebra is a little
bit more complicated. The result is
¡ ¢
2 1 ’ eγN
B(N) =
(γ + φ + σr σΛ )(eγN ’ 1) + 2γ
µ µ ¶ ¶
r 2γ
A(N) = 2 ln + ψN
σ2 ψ(eγN ’ 1) + 2γ

(φ + σ r σ Λ )2 + 2σ2
γ = r
ψ = φ + σΛ σr + γ.

The CIR model can also be solved by expectation. In fact, this is how Cox Ingersoll
and Ross (1985) actually solve it “ their marginal value of wealth JW is the same thing as
the discount factor. However, where the interest rate in the Vasicek model was a simple
conditional normal, the interest rate now has a non-central χ2 distribution, so taking the
integral is a little messier.

19.5.4 Multifactor af¬ne models

The Vasicek and CIR models are special cases of the af¬ne class of term structure mod-
els (Duf¬e and Kan 1996, Dai and Singleton 1999). These models allow multiple factors,
meaning all bond yields are not just a function of the short rate. Af¬ne models maintain the
convenient form that log bond prices are linear functions of the state variables. This means
that we can take K bond yields themselves as the state variables, and the yields will reveal
anything of interest in the hidden state variables. The short rate and its volatility will be
forecast by lagged short rates but also by lagged long rates or interest rate spreads. My pre-
sentation and notation is similar to Dai and Singleton™s, but as usual I add the discount factor


Here is the af¬ne model setup:

dy = φ (¯ ’ y) dt + Σdw
= δ0 + δ0y (19.290)

= ’rdt ’ b0 dw
Λ q
±i + β 0 ydzi ; E(dzi dzj ) = 0. (19.292)
dwi = i

Equation (19.289) describes the evolution of the state variables. In the end, yields will be
linear functions of the state variables, so we can take the state variables to be yields; thus we
use the letter y. y denotes a K’ dimensional vector of state variables. φ is now a K — K
matrix, y is a K’ dimensional vector, Σ is a K — K matrix. Equation (19.290) describes the
mean of the discount factor or short rate as a linear function of the state variables. Equation
(19.291) is the discount factor. bΛ is a K’dimensional vector that describes how the discount
factor responds to the K shocks. The more Λ responds to a shock, the higher the market price
of risk of that shock. Equation (19.292) describes the shocks dw. The functional form nests
the CIR square root type models if ±i = 0 and the Vasicek type Gaussian process if β i = 0.
You can™t pick ±i and β i arbitrarily, as you have to make sure that ±i + β 0 y > 0 for all values
of y that the process can attain. Dai and Singleton characterize this “admissibility” criterion.
We ¬nd bond prices in the af¬ne setup following exactly the same steps as for the Vasicek
and CIR models. Again, we guess that prices are linear functions of the state variables y.
P (N, y) = eA(N)’B(N) y .

We apply Ito™s lemma to this guess, and substitute in the basic bond pricing equation (19.268).
We obtain ordinary differential equations that A(N ) and B(N) must satisfy,
Xµ ¶
‚B(N ) 10 2
= ’φ0 B(N ) ’ B(N)i bΛi + [Σ B(N )]i β i + δ (19.293)
‚N 2
Xµ ¶
‚A(N ) 10 2
B(N )i bΛi + [Σ B(N )]i ±i ’ B(N )0 φ¯ ’ δ 0 . (19.294)
= y
‚N 2

I use the notation [x]i to denote the ith element of a vector x. As with the CIR and Vasicek
models, these are ordinary differential equations that can be solved by integration starting
with A(0) = 0, B(0) = 0. While they do not always have analytical solutions, they are
quick to solve numerically “ much quicker than solving a partial differential equation.
To derive (19.294) and (19.293), we start with the basic bond pricing equation (19.268),
which I repeat here,
µ ¶µ ¶ µ ¶
dP 1 ‚P dP dΛ
Et ’ + r dt = ’Et .


We need dP/P,

1 ‚P 0 1 1 0 ‚ 2P
= dy + dy dy.
P P ‚y 2P
The derivatives are
1 ‚P
= ’B(N )
P ‚y
1 ‚ 2P
= B(N)B 0 (N )
P ‚y‚y0
‚A(N ) ‚B(N ) 0
1 ‚P
= ’ y.
P ‚N ‚N ‚N
Thus, the ¬rst term (19.295) is
µ ¶
dP 1
= ’B(N )0 φ (¯ ’ y) dt + Et (dw0 Σ0 B(N )B 0 (N )Σdw)
Et y
P 2

Et (dwi dwj ) = 0, which allows us to simplify the last term. If w1 w2 = 0, then,
· ¸· ¸ X
£ ¤ b1 b1 b1 b2 w1
(w0 bb0 w) = w1 w2 = b2 w1 + b2 w2 =
2 2
b2 wi .
1 2 i
b2 b1 b2 b2 w2

Applying the same algebra to our case,
X X 2¡ ¢
[Σ0 B(N )]i ±i + β 0 y dt.
00 0 0 2
Et (dw Σ B(N)B (N )Σdw) = [Σ B(N )]i dwi = i
i i

I use the notation [x]i to denote the ith element of the K’dimensional vector x. In sum, we
µ ¶
1X 0 2¡ ¢
[Σ B(N)]i ±i + β 0 y dt. (296)
= ’B(N )0 φ (¯ ’ y) dt +
Et y i
P 2i

The right hand side term in (19.295) is
µ ¶
dP dΛ
= ’B(N)0 dwdw0 bΛ

¡ ¢
dwdw0 is a diagonal matrix with elements ±i + β 0 y . Thus,
µ ¶ X ¡ ¢
dP dΛ
B(N )i bΛi ±i + β 0 y (297)
’Et =’ i
PΛ i


Now, substituting (19.296) and (19.297) in (19.295), along with the easier ‚P/‚N central
term, we get
µ ¶
1X 0 ‚A(N ) ‚B(N ) 0
2¡ ¢
0 0
’B(N )0 φ (¯ ’ y) +
y [Σ B(N)]i ±i + β i y ’ ’ y + δ0 + δ y
2i ‚N ‚N
X ¡ ¢
B(N )i bΛi ±i + β 0 y .
=’ i

Once again, the terms on the constant and each yi must separately be zero. The constant
1X 0 X
’B(N )0 φ¯ +
y [Σ B(N)]i ±i ’ ’ δ0 = ’ B(N)i bΛi ±i .
2i ‚N i

µ ¶
‚A(N ) X 10 2
B(N )i bΛi + [Σ B(N )]i ±i ’ B(N)0 φ¯ ’ δ 0
= y
‚N 2

The terms multiplying y :

1X 0 X
‚B(N ) 0
20 0
B(N )i bΛi β 0 y.
B(N ) φy + [Σ B(N )]i β i y + y’δ y =’ i
2i ‚N i

Taking the transpose and solving,
Xµ ¶
‚B(N) 10 2
= ’φ B(N ) ’ B(N )i bΛi + [Σ B(N)]i β i + δ.
‚N 2

19.6 Bibliography and comments

The choice of discrete vs. continuous time is really one of convenience. Campbell, Lo
and MacKinlay (1997) give a discrete-time treatment, showing that bond prices are linear
functions of the state variables even in a discrete-time two-parameter square root model.
Models also don™t have to be af¬ne. Constantinides (1992) is a nice discrete time model;
its discount factor is driven by the squared value of AR(1) state variables. It gives closed form
solutions for bond prices. The bond prices are not linear functions of the state variables, but
it is the existence of closed forms rather than linearity that makes af¬ne models so attractive.
It allows for both signs of the term premium, as we seem to see in the data.
So far most of the term structure literature has emphasized the risk-neutral probabilities,
rarely making any reference to the separation between drifts and market prices of risk. This
was not a serious shortcoming for option pricing uses, for which modeling the volatilities is
much more important than for modeling the drifts, and to draw smooth yield curves across


maturities. However, it makes the models unsuitable for bond portfolio analysis and other
uses. Many models imply high and time-varying market prices of risk or conditional Sharpe
ratios. Recently, Duffee (1999) and Duarte (2000) have started the important task of specify-
ing term structure models that ¬t the empirical facts about expected returns in term structure
models. In particular, they try to ¬t the Fama-Bliss (1986) and Campbell and Shiller (1991)
regressions that relate expected returns to the slope of the term structure (see Chapter 20),
while maintaining the tractability of af¬ne models.
Term structure models used in ¬nance amount to regressions of interest rates on lagged
interest rates. Macroeconomists also run regressions of interest rates on a wide variety of
variables, including lagged interest rates, but also lagged in¬‚ation, output, unemployment,
exchange rates, and so forth. They often interpret these equations as the Federal Reserve™s
policy-making rule for setting short rates as a function of macroeconomic conditions. This
interpretation is particularly clear in the Taylor rule literature (Taylor 1999) and monetary
VAR literature, see Christiano Eichenbaum and Evans (1999), Cochrane (1994) for surveys.
Someone, it would seem, is missing important right hand variables.
The criticism of ¬nance models is stinging when we only use the short rate as a state vari-
able. Multifactor models are more subtle. If any variable forecasts future interest rates, then it
becomes a state variable, and it should be revealed by bond yields. Thus, bond yields should
completely drive out any other macroeconomic state variables as interest rate forecasters.
They don™t, which is an interesting observation.
In addition, there is an extensive literature that studies yields from a purely statistical point
of view, Gallant and Tauchen (1997) for example, and a literature that studies high frequency
behavior in the federal funds market, for example Hamilton (1996).
Obviously, these three literatures need to become integrated. Balduzzi, Bertola and Foresi
(1996) consider a model based on the Federal Funds target, and Piazzesi (2000) integrated a
careful speci¬cation of high-frequency moves in the Federal funds rate into a term structure
The models studied here are all based on diffusions with rather slow-moving state vari-
ables. These models generate one-day ahead densities that are almost exactly normal. In fact,
as Johannes (2000) points out, one-day ahead densities have much fatter tails than normal
distributions predict. This behavior could be modeled by fast-moving state variables. How-
ever, it is more natural to think of this behavior as generated by a jump process, and Johannes
nicely ¬ts a combined jump-diffusion for yields. This speci¬cation can change pricing and
hedging characteristics of term structure models signi¬cantly.
All of the term structure models in this chapter describe many bond yields as a function
of a few state variables. This is a reasonable approximation to the data. Almost all of the
variance of yields can be described in terms of a few factors, typically a “level” “slope”
and “hump” factor. Knez, Litterman and Scheinkman (1994) make the point with a formal
maximum likelihood factor analysis, but you can see the point with a simple eigenvalue
decomposition of log yields.


1 2 3 4 5
6.36 0.45 0.45 0.45 0.44 0.44 “Level”
0.61 -0.75 -0.21 0.12 0.36 0.50 “Slope”
0.10 0.47 -0.62 -0.41 0.11 0.46 “Hump”
0.08 0.10 -0.49 0.39 0.55 -0.55
0.07 0.07 -0.36 0.68 -0.60 0.21

Eigenvalue decomposition of the covariance matrix of zero coupon bond yields,
1952-1997. The ¬rst column gives the square root of the eigenvalues. The columns
marked 1-5 give the eigenvectors corresponding to 1-5 year zero coupon bond yields.
I decomposed the covariance matrix as Σ = QΛQ0 ; σ2 gives the diagonal entries
in Λ and the rest of the table gives the entries of Q. With this decomposition, we
can say that bond yields are generated by y = QΛ1/2 µ; E(µµ0 ) = I, thus Q give
“loadings” on the shocks µ.

Not only is the variance of yields well described by a factor model, but the information
in current yields about future yields “ the expected changes in yields and the conditional
volatility of yields “ is well captured by one level and a few spreads as well.
It is a good approximation, but it is an approximation. Actual bond prices do not exactly
follow any smooth yield curve, and the covariance matrix of actual bond yields does not
have an exact K factor structure “ the remaining eigenvalues are not zero. Hence you cannot
estimate a term structure model directly by maximum likelihood; you either have to estimate
the models by GMM, forcing the estimate to ignore the stochastic singularity, or you have to
add distasteful measurement errors.
As always, the importance of an approximation depends on how you use the model. If
you take the model literally, a bond whose price deviates by one basis point is an arbitrage
opportunity. In fact, it is at best a good Sharpe ratio, but a K factor model will not tell
you how good “ it won™t quantify the risk involved in using the model for trading purposes.
Hedging strategies calculated from K-factor models may be sensitive to small deviations as
One solution has been to pick different parameters at each point in time (Ho and Lee
1986). This approach is useful for derivative pricing, but is obviously not a satisfactory
solution. Models in which the whole yield curve is a state variable, Kennedy (1994), Santa
Clara and Sornette (1999), are another interesting response to the problem, and potentially
provide a realistic description of the data.
The market price of interest rate risk re¬‚ects the market price of real interest rate changes
and the market price of in¬‚ation “ or whatever real factors are correlated with in¬‚ation. The
relative contributions of in¬‚ation and real rates in interest rate changes is very important
for the nature of the risks that bondholders face. For example, if real rates are constant
and nominal rates change on in¬‚ation news, then short term bonds are the safest real long


term investment. If in¬‚ation is constant and nominal rates change on real rate news, then
long term bonds are the safest long term investment. The data seem to suggest a change
in regime between the 1970s and 1990s: in the 70™s, most interest rate changes were due to
in¬‚ation, while the opposite seems true now. Despite all these provocative thoughts, though,
little empirical work has been done that usefully separates interest rate risk premia into real
and in¬‚ation premium components. Buraschi and Jiltsov (1999) is one recent effort in this
direction, but a lot more remains to be done.

19.7 Problems

1. Complete the proof that each of the three statements of the expectations hypothesis
implies the other. Is this also true if we add a constant risk premium? Are the risk premia
in each of the three statements of the yield curve of the same sign?
2. Under the expectations hypothesis, if long-term yields are higher than short term yields,
does this mean that future long term rates should go up, down, or stay the same? (Hint: a
plot of the expected log bond prices over time will really help here.)
(N) (1)
3. Start by assuming risk neutrality, E(HP Rt+1 ) = Yt for all maturities N . Try to
derive the other representations of the expectations hypothesis. Now you see why we
specify that the expected log returns are equal.
4. Look at (19.266) and show that adding orthogonal dw to the discount factor has no effect
on bond pricing formulas.
5. Look at (19.266) and show that P = e’rT if interest rates are constant, i.e. if
dΛ/Λ = ’rdt + σΛ dz.

Empirical survey


This part is a brief attempt to survey some of the central empirical issues that have driven
recent thinking about ¬nancial economics for, and which are driving the development of our
theoretical understanding of the nature of risk and risk premia.
This part draws heavily on two previous review articles, Cochrane (1998), (1999a) and
on Cochrane and Hansen (1992). Fama™s (1970) and (1991) ef¬cient market reviews are
classic and detailed reviews of much of the underlying empirical literature, focusing on cross-
sectional questions. Campbell (1999, 2000) and Kocheralkota (1996) are good surveys of the
equity premium literature.

Chapter 20. Expected returns in the
time-series and cross-section
The ¬rst revolution in ¬nance started the modern ¬eld. Peaking in the early 1970s, this
revolution established the CAPM, random walk, ef¬cient markets, portfolio based view of
the world. The pillars of this view are:

1. The CAPM is a good measure of risk and thus a good explanation why some stocks,
portfolios, strategies or funds (assets, generically) earn higher average returns than
2. Returns are unpredictable. In particular,
(a) Stock returns are close to unpredictable. Prices are close to random walks; expected
returns do not vary greatly through time. “Technical analysis” that tries to divine
future returns from past price and volume data is nearly useless. Any apparent
predictability is either a statistical artifact which will quickly vanish out of sample,
or cannot be exploited after transactions costs. The near unpredictably of stock
returns is simply stated, but its implications are many and subtle. (Malkiel 1990
is a classic and easily readable introduction.) It also remains widely ignored, and
therefore is the source of lots of wasted trading activity.
(b) Bond returns are nearly unpredictable. This is the expectations model of the term
structure. If long term bond yields are higher than short term yields “ if the yield
curve is upward sloping “ this does not mean that expected long-term bond returns
are any higher than those on short term bonds. Rather, it means that short term
interest rates are expected to rise in the future, so you expect to earn about the same
amount on short term or long term bonds at any horizon.
(c) Foreign exchange bets are not predictable. If a country has higher interest rates
than are available in the U.S. for bonds of a similar risk class, its exchange rate is
expected to depreciate. After you convert your investment back to dollars, you
expect to make the same amount of money holding foreign or domestic bonds.
(d) Stock market volatility does not change much through time. Not only are returns
close to unpredictable, they are nearly identically distributed as well.
3. Professional managers do not reliably outperform simple indices and passive portfolios
once one corrects for risk (beta). While some do better than the market in any given year,
some do worse, and the outcomes look very much like good and bad luck. Managers
who do well in one year are not more likely to do better than average the next year. The
average actively-managed fund does about 1% worse than the market index. The more
actively a fund trades, the lower returns to investors.

Together, these views re¬‚ected a guiding principle that asset markets are, to a good ap-
proximation, informationally ef¬cient. (Fama 1970, 1991.) This statement means that market
prices already contain most information about fundamental value. Informational ef¬ciency

in turn derives from competition. The business of discovering information about the value of
traded assets is extremely competitive, so there are no easy quick pro¬ts to be made, as there
are not in every other well-established and competitive industry. The only way to earn large
returns is by taking on additional risk.
These statements are not doctrinaire beliefs. Rather, they summarize the ¬ndings of a
quarter-century of extensive and careful empirical work. However, every single one of them
has now been extensively revised by a new generation of empirical research. Now, it seems
that :

1. There are assets, portfolios, funds, and strategies whose average returns cannot be
explained by their market betas. Multifactor models dominate the empirical description,
performance attribution, and explanation of average returns.
2. Returns are predictable. In particular,
(a) Variables including the dividend/price ratio and term premium can in fact predict
substantial amounts of stock return variation. This phenomenon occurs over
business-cycle and longer horizons. Daily, weekly and monthly stock returns are
still close to unpredictable, and “technical” systems for predicting such movements
are still close to useless after transactions costs.
(b) Bond returns are predictable. Though the expectations model works well in the long
run, a steeply upward sloping yield curve means that expected returns on long term
bonds are higher than on short term bonds for the next year.
(c) Foreign exchange returns are predictable. If you buy bonds in country whose interest
rates are unusually higher than those in the U.S., you expect a greater return, even
after converting back to dollars.
(d) Stock market volatility does in fact change through time. Conditional second
moments vary through time as well as ¬rst moments. Means and variances do not
seem to move in lockstep, so conditional Sharpe ratios vary through time.
3. Some funds seem to outperform simple indices, even after controlling for risk through
market betas. Fund returns are also slightly predictable: past winning funds seem to do
better in the future, and past losing funds seem to do worse than average in the future. For
a while, this seemed to indicate that there is some persistent skill in active management.
However, we now see that multifactor performance attribution models explain most fund
persistence: funds earn persistent returns by following fairly mechanical “styles,” not by
persistent skill at stock selection (Carhart 1997).

Again, these views summarize a large body of empirical work. The strength and interpre-
tation of many results are hotly debated.
This new view of the facts need not overturn the view that markets are reasonable com-
petitive and therefore reasonably ef¬cient. It does substantially enlarge our view of what
activities provide rewards for holding risks, and it challenges our economic understanding of
those risk premia. As of the early 1970s, asset pricing theory anticipated the possibility and


even probability that expected returns should vary over time and that covariances past mar-
ket betas would be important for understanding cross-sectional variation in expected returns.
What took another 15 to 20 years was to see how important these long-anticipated theoretical
possibilities are in the data.

20.1 Time-series predictability

I start by looking at patterns in expected returns over time in large market indices, and then
look at patterns in expected returns across stocks.

20.1.1 Stocks

Dividend-price ratios forecast excess returns on stocks. Regression coef¬cients and R2
rise with the forecast horizon. This is a result of the fact that the forecasting variable is

Table 1 gives a simple example of market return predictability. “Low” prices relative
to dividends forecast higher subsequent returns. The one-year horizon 0.17 R2 is not par-
ticularly remarkable. However, at longer and longer horizons larger and larger fractions of
return variation are forecastable. At a 5 year horizon 60% of the variation in stock returns is
forecastable ahead of time from the price/divided ratio! (Fama and French 1988.)

Horizon k Rt’t+k = a + b(Dt /Pt ) Dt+k /Dt = a + b(Dt /Pt )
(years) R2 R2
b σ(b) b σ(b)
1 5.3 (2.0) 0.15 2.0 (1.1) 0.06
2 10 (3.1) 0.23 2.5 (2.1) 0.06
3 15 (4.0) 0.37 2.4 (2.1) 0.06
5 33 (5.8) 0.60 4.7 (2.4) 0.12

Table 1. OLS regressions of percent excess returns (value weighted NYSE - treasury
bill rate) and real dividend growth on the percent VW dividend/price ratio. Rt’t+k indicates
the k year return. Standard errors in parenthesis use GMM to correct for heteroskedas-
ticity and serial correlation. Sample 1947-1996.

One can object to dividends as the divisor for prices. However, ratios formed with just
about any sensible divisor works about as well, including earnings, book value, and moving
averages of past prices.
Many other variables forecast excess returns, including the term spread between long and
short term bonds, the default spread, the level of the T-bill rate, (Fama and French 1989,)


the detrended T-bill rate, and the earnings/dividend ratio (Lamont 1998). Macro variables
forecast stock returns as well, including the investment/capital ratio (Cochrane 1991) and the
consumption/wealth ratio (Lettau and Ludvigson 2000).
Most of these variables are correlated with each other and correlated with or forecast
business cycles. This fact suggests a natural explanation, emphasized by Fama and French
(1999): Expected returns vary over business cycles; it takes a higher risk premium to get
people to hold stocks at the bottom of a recession. When expected returns go up, prices go
down. We see the low prices, followed by the higher returns expected and required by the
market. (Regressions do not have to have causes on the right and effects on the left. You run
regressions with the variable orthogonal to the error on the right, and that is the case here
since the error is a forecasting error. This is like a regression of actual weather on a weather
Table LL, adapted from Lettau and Ludvigson (2000) compares several of these variables.
At a one year horizon, both the consumption/wealth ratio and the detrended T bill rate forecast
returns, with R2 of 0.18 and 0.10 respectively. At the one year horizon, these variables are
more important than the dividend/price and dividend/earnings ratios, and their presence cuts
the dividend ratio coef¬cients in half. However, the d/p and d/e ratios are slower moving
than the t bill rate and consumption/wealth ratio. They track decade-to-to decade movements
more than business cycle movements. This means that their importance builds with horizon.
By six years, the bulk of the return forecastability again comes from the dividend ratios, and
it is their turn to cut down the cay and t-bill regression coef¬cients. The cay and d/e variables
have not been that affected by the late 90s, while it has substantially cut down dividend yield

Horizon(years) R2
cay d’p d’e rrel
1 6.7 0.18
1 0.14 0.08 0.04
1 -4.5 0.10
1 5.4 0.07 -0.05 -3.8 0.23
6 12.4 0.16
6 0.95 0.68 0.39
6 -5.10 0.03
6 5.9 0.89 0.65 1.36 0.42

Table LL. Long-horizon return forecasts. The return variable is log excess returns on the
S&P composite index. cay is Lettau and Ludvigson™s consumption to wealth ratio. d ’ p
is the log dividend yield and e ’ p is the log earnings yield. rrel is a detrended short term
interest rate. Sample 1952:4-1998:3. Source: Lettau and Ludvigson (2000) Table 5.
I emphasize that excess returns are forecastable. We have to understand this as time-
variation in the reward for risk, not time-varying interest rates. One naturally slips in to
non-risk explanations for price variation; for example that the current stock market boom is
due to life-cycle savings of the baby boomers. A factor like this does not reference risks; it


predicts that interest rates should move just as much as stock returns.
Persistent d/p; Long horizons are not a separate phenomenon
The results at different horizons are not separate facts, but re¬‚ections of a single under-
lying phenomenon. If daily returns are very slightly predictable by a slow-moving variable,
that predictability adds up over long horizons. For example, you can predict that the temper-
ature in Chicago will rise about 1/3 degree per day in the springtime. This forecast explains
very little of the day to day variation in temperature, but tracks almost all of the rise in
temperature from January to July. Thus, the R2 rises with horizon.
Thus, a central fact driving the predictability of returns is that the dividend price ratio
is very persistent. Figure 37 plots the d/p ratio and you can see directly that it is extremely
slow-moving. Below, I will estimate an AR(1) coef¬cient around 0.9 in annual data.

Figure 37.

To see more precisely how the results at various horizons are linked, and how they result
from the persistence of the d/p ratio, suppose that we forecast returns with a forecasting
variable x, according to
rt+1 = axt + µt+1
xt+1 = ρxt + δ t+1 .

(0bviously, you demean the variables or put constants in the regressions.) Small values of
b and R2 in (20.298) and a large coef¬cient ρ in (20.299) imply mathematically that the


long-horizon regression has a large regression coef¬cient b and large R2 . To see this, write

rt+1 + rt+2 = a(1 + ρ)xt + aδ t+1 + µt+1 + µt+2
= a(1 + ρ + ρ2 )xt + aρδ t+1 + aδ t+2 + µt+1 + µt+2 + µt+3 .
rt+1 + rt+2 + rt+3

You can see that with ρ near one, the coef¬cients increase with horizon, almost linearly at
¬rst and then at a declining rate. The R2 are a little messier to work out, but also rise with
The numerator in the long-horizon regression coef¬cient is

E [(rt+1 + rt+2 + ... + rt+k ) xt ]

where the symbols represent deviations from their means. With stationary r and x, E(rt+j xt ) =
E(rt+1 xt’j ), so this is the same moment as

E [rt+1 (xt + xt’1 + xt’2 + ...)] ,

the numerator of a regression coef¬cient of one year returns on many lags of price dividend
ratios. Of course, if you run a multiple regression of returns on lags of p/d, you quickly ¬nd
that most lags past the ¬rst do not help the forecast power. (That statement would be exact in
the AR(1) example.)
This observation shows once again that one-year and multi-year forecastability are two
sides of the same coin. It also suggests that on a purely statistical basis, there will not be a
huge difference one-year return forecasts and multi-year return forecasts (correcting the lat-
ter for the serial correlation of the error term due to overlap). Hodrick (1991) comes to this
conclusion in a careful Monte Carlo experiment, comparing moments of the form (20.300),
(20.301) and E(rt+1 xt ). The multi-year regressions, or the implied multi year regressions
from one-year forecasts with a slow moving right hand variable are thus mostly useful for
illustrating the dramatic economic implications of forecastability, rather than as clever statis-
tical tools that enhance power and allow us to distinguish previously foggy hypotheses.
The slow movement of the price-dividend ratio means that on a purely statistical basis,
return forecastability is a very open question. What we really know (see Figure 37) is that
low prices relative to dividends and earnings in the 50™s preceded the boom market of the
early 60™s; that the high price-dividend ratios of the mid-60™s preceded the poor returns of
the 70™s; that the low price ratios of the mid-70™s preceded the current boom. We really
have three data postwar data points; a once per generation change in expected returns. In
addition, the last half of the 1990s has seen a historically unprecedented rise in stock prices
and price/dividend ratios (or any other ratio). This rise has cut the postwar return forecasting
regression coef¬cient in half. On the other hand, another crash or even just a decade of poor
returns will restore the regression. Data back to the 1600s show the same pattern, but we are
often uncomfortable making inferences from centuries-old data.


20.1.2 Volatility

Price dividend ratios can only move at all if they forecast future returns, if they forecast
future dividend growth, or if there is a bubble “ if the price-dividend ratio is nonstationary and
is expected to grow explosively. In the data, most variation in price-dividend ratios results
from varying expected returns. “Excess volatility” “ relative to constant discount rate present
value models “ is thus exactly the same phenomenon as forecastable long-horizon returns.
I also derive the very useful price-dividend and return linearizations. Ignoring constants

ρj’1 (∆dt+j ’ rt+j )
pt ’ dt = Et
® 
∞ ∞
= (Et ’ Et’1 ) ° ρj rt+j »
ρj ∆dt+j ’
rt ’ Et’1 rt
j=0 j=1

rt+1 = ∆dt+1 ’ ρ(dt+1 ’ pt+1 ) + (dt ’ pt ).

The volatility test literature starting with Shiller (1981) and LeRoy and Porter (1981)
(See Cochrane 1991 for a review) started out trying to make a completely different point.
Predictability seems like a sideshow. The stunning fact about the stock market is its ex-
traordinary volatility. On a typical day, the value of the U.S. capital stock changes by a full
percentage point, and days of 2 or 3 percentage point changes are not uncommon. In a typical
year it changes by 16 percentage points, and 30 percentage point changes are not uncommon.
Worse, most of that volatility seems not to be accompanied by any important news about fu-
ture returns and discount rates. 30% of the capital stock of the United States vanished in
a year and nobody noticed? Surely, this observation shows directly that markets are “not
ef¬cient” “ that prices do not correspond to the value of capital “ without worrying about
It turns out however, that “excess volatility” is exactly the same thing as return predictabil-
ity. Any story you tell about prices that are “too high” or “too low” necessarily imply that
subsequent returns will be too low or too high as prices rebound to their correct levels.
When prices are high relative to dividends (or earnings, cash¬‚ow, book value or some
other divisor), one of three things must be true: 1) Investors expect dividends to rise in the
future. 2) Investors expect returns to be low in the future. Future cash¬‚ows are discounted
at a lower than usual rate, leading to higher prices. 3) Investors expect prices to rise forever,
giving an adequate return even if there are no dividends. This statement is not a theory, it is an
identity: If the price-dividend ratio is high, either dividends must rise, prices must decline, or
the price-dividend ratio must grow explosively The open question is, which option holds for


our stock market? Are prices high now because investors expect future earnings, dividends
etc. to rise, because they expect low returns in the future, or because they expect prices to go
on rising forever?
Historically, we ¬nd that virtually all variation in price-dividend ratios has re¬‚ected vary-
ing expected excess returns.
Exact present value identity
To document this statement, we need to relate current prices to future dividends and re-
turns. Start with the identity
Pt+1 + Dt+1
1 = Rt+1 Rt+1 = R’1
and hence
µ ¶
Pt Pt+1 Dt+1
= Rt+1 1 + .
Dt Dt+1 Dt
We can iterate this identity forward and take conditional expectations to obtain the identity
Ãj !

Pt ’1
= Et Rt+k ∆Dt+k
Dt j=1 k=1

where ∆Dt ≡ Dt /Dt’1 . (We could iterate (20.302) forward to
Ãj !

R’1 Dt+j ,
Pt = t+k
j=1 k=1

but prices are not stationary, so we can™t ¬nd the variance of prices from a time-series average.
Much of the early volatility test literature concerned stationarity problems. Equation (20.303)
also requires a limiting condition that the price dividend ratio cannot explode faster than
³Q ´
j ’1
returns, limj’∞ Et k=1 Rt+k Pt+j /Dt+j . I come back to this condition below)

Equation (??) shows that high prices must, mechanically, come from high future dividend
growth or low future returns.
Approximate identity
The nonlinearity of (20.303) makes it hard to handle, and means that we cannot use simple
time-series tools. You can linearize (20.303) directly with a Taylor expansion ( Cochrane
1991 takes this approach.) Campbell and Shiller (1988) approximate the one period return
identity before iterating, which is algebraically simpler and is the most popular linearization.
Start again from the obvious,
Pt+1 + Dt+1
1 = R’1 Rt+1 = Rt+1


Multiplying both sides by Pt /Dt and massaging the result,
µ ¶
Pt Pt+1 Dt+1
= Rt+1 1 + .
Dt Dt+1 Dt
Taking logs, and with lowercase letters denoting logs of uppercase letters,
¡ ¢
pt ’ dt = ’rt+1 + ∆dt+1 + ln 1 + ept+1 ’dt+1

Taking a Taylor expansion of the last term about a point P/D = ep’d
µ ¶ P
pt ’ dt = ’rt+1 + ∆dt+1 + ln 1 + + [pt+1 ’ dt+1 ’ (p ’ d)]
D 1+ D
pt ’ dt = ’rt+1 + ∆dt+1 + k + ρ (pt+1 ’ dt+1 ) .

Since the average dividend yield is about 4% and average price/dividend ratio is about 25, ρ
is a number very near one. I will use ρ = 0.96 for calculations,
P/D 1
ρ= = ≈ 1 ’ D/P = 0.96.
1 + P/D 1 + D/P
Without the constant k, the equation can also apply to deviations from means or any other
Now, iterating forward is easy, and results in the approximate identity

ρj’1 (∆dt+j ’ rt+j ).
pt ’ dt = const. + (305)

(Again, we need a condition that pt ’ dt does not explode faster than ρ’t , limj’∞ ρj (pt+j ’
dt+j ) = 0. I return to this condition below.)
Since (20.305) holds ex-post, we can take conditional expectations and relate price-dividend
ratios to ex-ante dividend growth and return forecasts

pt ’ dt = const. + Et (306)
ρj’1 (∆dt+j ’ rt+j ).

Now it is really easy to see that a high price-dividend ratio must be followed by high dividend
growth ∆d, or low returns r. Which is it?
Decomposing the variance of price-dividend ratios
To address this issue, equation (20.305) implies
«  « 
∞ ∞
var(pt ’ dt ) = cov pt ’ dt , ρj’1 ∆dt+j  ’ cov pt ’ dt , ρj’1 rt+j  (307)
j=1 j=1


In words, price-dividend ratios can only vary if they forecast changing dividend growth or
of they forecast changing returns. (To derive 20.307 from (20.305), multiply both sides by
(pt ’ dt ) ’ E(pt ’ dt ) and take expectations.) Notice that both terms on the right hand side
of (20.307) are the numerators of exponentially weighted long-run regression coef¬cients.
This is a powerful equation. At ¬rst glance, it would seem a reasonable approximation
that returns are unforecastable (the “random walk” hypothesis) and that dividend growth is
not forecastable either. But if this were the case, the price/dividend ratio would have to be a
constant. Thus the fact that the price/dividend ratio varies at all means that either dividend
growth or returns must be forecastable “ that the world is not i.i.d.
At a simple level, Table 1 includes regressions of long-horizon dividend growth on div-
idend/price ratios to match the return regressions. The coef¬cients in the dividend growth
case are much smaller, typically one standard error from zero, and the R2 are tiny. Worse,
the signs are wrong. To the extent that a high price-dividend ratio forecasts any change in
dividends, it seems to forecast a small decline in dividends!
Having seen equation (20.307), one is hungry for estimates. Table 2 presents some, taken
from Cochrane (1991b). As one might suspect from Table 1, Table 2 shows that in the past
almost all variation in price-dividend ratios is due to changing return forecasts.
The elements do not have to be between 0 and 100%. For example, -34, 138 occurs
because high prices seem to forecast lower real dividend growth (though this number is not
statistically signi¬cant). Therefore they must and do forecast really low returns, and returns
must account for more than 100% of price-dividend variation.

Dividends Returns
Real -34 138
std. error 10 32
Nominal 30 85
std. error 41 19

Table 2. Variance decomposition of value-weighted NYSE price-dividend ratio.
Table entries are the percent of the variance of theP
price-dividend ratio attributable
to dividend and return forecasts, 100—cov(pt ’dt , 15 ρj’1 ∆dt+j )/var(pt ’dt )
and similarly for returns.

This observation solidi¬es one™s belief in price-dividend ratio forecasts of returns. Yes,
the statistical evidence that price-dividend ratios forecast returns is weak, and many return
forecasting variables have been tried and discarded, so selection bias is a big worry in fore-
casting regressions. But the price-dividend ratio (or price-earning, market to book, etc.) has a
special status since it must forecast something. To believe that the price-dividend ratio is sta-
tionary and varies, but does not forecast returns, you have to believe that the price-dividend
ratio does forecast dividends. Given this choice and Table 1, it seems a much ¬rmer conclu-
sion that it forecasts returns.


It is nonetheless an uncomfortable fact that almost all variation in price-dividend ratios
is due to variation in expected excess returns. How nice it would be if high prices re¬‚ected
expectations of higher future cash¬‚ows. Alas, that seems not to be the case. If not, it would
be nice if high prices re¬‚ected lower interest rates. Again, that seems not to be the case. High
prices re¬‚ect low risk premia, lower expected excess returns.
Campbell™s return decomposition.
Campbell (1991) provides a similar decomposition for unexpected returns,
® 
∞ ∞
rt ’ Et’1 rt = (Et ’ Et’1 ) ° ρj rt+j » . (308)
ρj ∆dt+j ’
j=0 j=1

A positive shock to returns must come from a positive shock to forecast dividend growth, or
to a negative shock to forecast returns.
Since a positive shock to time t dividends is directly paid as a return, (the ¬rst sum starts
at j = 0), Campbell ¬nds some fraction of return variation is due to current dividends.
However, once again, the bulk of index return variation comes from shocks to future returns,
i.e. discount rates.
To derive (20.308), start with the approximate identity (20.305), and move it back one

pt’1 ’ dt’1 = const. + ρj (∆dt+j ’ rt+j ).

Now take innovations of both sides,

ρj (∆dt+j ’ rt+j ).
0 = (Et ’ Et’1 )

Pulling rt over to the left hand side, you obtain (20.308). (Problem 3 at the end of the chapter
guides you through an alternative and more constructive derivation.)
So far, we have concentrated on the index. One can apply the same analysis to ¬rms.
What causes the variation in price-dividend ratios, or, better book/market ratios (since divi-
dends can be zero) across ¬rms, or over time for a given ¬rm? Vuolteenaho (2000) applies
the same sort of analysis to individual stock data. He ¬nds that as much as half of the vari-
ation in individual ¬rm book/market ratios re¬‚ect expectations of future cash¬‚ows. Much
of the expected cash¬‚ow variation is idiosyncratic, while the expected return variation is
common, which is why variation in the index book/market ratio, like variation in the index
dividend/price ratio, is almost all due to varying expected excess returns.


In deriving the exact and linearized present value identities, I assumed an extra condition
that the price-dividend ratio does not explode. Without that condition, and taking expectations
of both sides, the exact identity reads
à ! à !
j j

Pt Pt+j
R’1 (309)
= Et Rt+k ∆Dt+k + lim Et t+k
Dt Dt+j
j=1 k=1 k=1

and the linearized identity reads

pt ’ dt = const. + Et (310)
ρj’1 (∆dt+j ’ rt+j ) + Et lim ρj (pt+j ’ dt+j ).


. 13
( 17)