. 15
( 17)


Still, does a 5% interest rate differential correspond to an exactly 5% expected depre-
ciation, or does some of it still represent a high expected return from holding debt in that
country™s currency? Furthermore, while expected depreciation is clearly a large part of the
story for high interest rates in countries that have constant high in¬‚ation or that may suffer
spectacular depreciation of a pegged exchange rate, how does the story work for, say, the U.S.
vs. Germany, where in¬‚ation rates diverge little, yet exchange rates ¬‚uctuate a surprisingly
large amount?
Table 6 presents the facts, as summarized by Hodrick (2000) and Engel (1996). The ¬rst
row of Table 6 presents the average appreciation of the dollar against the indicated currency
over the sample period. The dollar fell against DM, yen and Swiss Franc, but appreciated
against the pound. The second row gives the average interest rate differential “ the amount
by which the foreign interest rate exceeds the US interest rate. According to the expectations
hypothesis, these two numbers should be equal “ interest rates should be higher in countries
whose currencies depreciate against the dollar.
The second row shows roughly the right pattern. Countries with steady long-term in¬‚ation
have steadily higher interest rates, and steady depreciation. The numbers in the ¬rst and
second rows are not exactly the same, but exchange rates are notoriously volatile so these
averages are not well measured. Hodrick shows that the difference between the ¬rst and
second rows is not statistically different from zero. This fact is exactly analogous to the
fact of Table 4 that the expectations hypothesis works well “on average” for US bonds and
is the tip of an iceberg of empirical successes for the expectations hypothesis as applied to
As in the case of bonds, however, we can also ask whether times of temporarily higher
or lower interest rate differentials correspond to times of above and below average depre-


ciation as they should. The third and ¬fth rows of Table 6 address this question, updating
Hansen and Hodrick™s (1980) and Fama™s (1984) regression tests. The number here should
be +1.0 in each case “ an extra percentage point interest differential should correspond to
one extra percentage point expected depreciation. As you can see, we have exactly the op-
posite pattern: a higher than usual interest rate abroad seems to lead, if anything to further
appreciation. It seems that the old fallacy of confusing interest rate differentials across coun-
tries with expected returns, forgetting about depreciation, also contains a grain of truth. This
is the “forward discount puzzle,” and takes its place alongside the forecastability of stock
and bond returns. Of course it has produced a similar avalanche of academic work dissect-
ing whether it is really there and if so, why. Hodrick (1987), Engel (1996), and Lewis (1995)
provide surveys.
The R2 shown in Table 6 are quite low. However, like D/P, the interest differential is a
slow-moving forecasting variable, so the return forecast R2 build with horizon. Bekaert and
Hodrick (1992) report that the R2 rise to the 30-40% range at six month horizons and then
decline again. Still, taking advantage of this predictability, like the bond strategies described
above, is quite risky.

Mean appreciation -1.8 3.6 -5.0 -3.0
Mean interest differential -3.9 2.1 -3.7 -5.9
b, 1975-1989 -3.1 -2.0 -2.1 -2.6
.026 .033 .034 .033
b, 1976-1996 -0.7 -1.8 -2.4 -1.3

Table 6. The ¬rst row gives the average appreciation of the dollar against the indi-
cated currency, in percent per year. The second row gives the average interest dif-
ferential “ foreign interest rate less domestic interest rate, measured as the forward
premium “ the 30 day forward rate less the spot exchange rate. The third through
¬fth rows give the coef¬cients and R2 in a regression of exchange rate changes on
the interest differential = forward premium,
f d
st+1 ’ st = a + b(ft ’ st ) + µt+1 = a + b(rt ’ rt ) + µt+1

where s = log spot exchange rate, f = forward rate, rf = foreign interest rate, rd =
domestic interest rate. Source: Hodrick (1999) and Engel (1996).

The puzzle does not say that one earns more by holding bonds from countries with higher
interest rates than others. Average in¬‚ation, depreciation, and interest rate differentials line
up as they should. If you just buy bonds with high interest rates, you end up with debt from
Turkey and Brazil, whose currencies in¬‚ate and depreciate steadily. The puzzle does say that
one earns more by holding bonds from countries whose interest rates are higher than usual
relative to U.S. interest rates.
However, the fact that the “usual” rate of depreciation and “usual” interest differential


varies through time, if they are well-de¬ned concepts at all, may diminish if not eliminate the
out-of-sample performance of trading rules based on these regressions.
The foreign exchange regressions offer a particularly clear-cut case in which “Peso prob-
lems” can skew forecasting regressions. Lewis (1995) credits Milton Friedman for coining
the term to explain why Mexican interest rates were persistently higher than U.S. interest
rates in the early 1970™s even though the currency had been pegged for more than a decade.
A small probability of a huge devaluation each period can correspond to a substantial interest
differential. You will see long stretches of data in which the expectations hypothesis seems
not to be satis¬ed, because the collapse does not occur in sample. The Peso subsequently col-
lapsed, giving substantial weight to this view. Since “Peso problems” have become a generic
term for the effects of small probabilities of large events on empirical work. Rietz (1988) of-
fered a Peso problem explanation for the equity premium that investors are afraid of another
great depression which has not happened in sample. Selling out of the money put options and
earthquake insurance in Los Angeles are similar strategies whose average returns in a sample
will be severely affected by rare events.

20.2 The Cross-section: CAPM and Multifactor Models

Having studied how average returns change over time, now we study how average returns
change across different stocks or portfolios.

20.2.1 The CAPM

For a generation, portfolios with high average returns also had high betas. I illustrate with
the size-based portfolios.

The ¬rst tests of the CAPM such as Lintner (1965) were not a great success. If you plot
or regress the average returns versus betas of individual stocks, you ¬nd a lot of dispersion,
and the slope of the line is much too ¬‚at “ it does not go through any plausible riskfree rate.
Miller and Scholes (1972) diagnosed the problem. Betas are measured with error, and
measurement error in right hand variables biases down regression coef¬cients. Fama and
MacBeth (1973) and Black, Jensen and Scholes (1972) addressed the problem by grouping
stocks into portfolios. Portfolio betas are better measured because the portfolio has lower
residual variance. Also, individual stock betas vary over time as the size, leverage, and risks
of the business change. Portfolio betas may be more stable over time, and hence easier to
measure accurately.
There is a second reason for portfolios. Individual stock returns are so volatile that you

cannot reject the hypothesis that all average returns are the same. σ/ T is big when σ =


40 ’ 80%. By grouping stocks into portfolios based on some characteristic (other than ¬rm
name) related to average returns, you reduce the portfolio variance and thus make it possible
to see average return deferences. Finally, I think much of the attachment to portfolios comes
from a desire to more closely mimic what actual investors would do rather than simply form
a test.
Fama and MacBeth and Black Jensen and Scholes formed their portfolios on betas. They
found individual stock betas, formed stocks into portfolios based on their betas, and then
estimated the portfolio™s beta in the following period. More recently, size, book/market,
industry, and many other characteristics have been used to form portfolios.
Ever since, the business of testing asset pricing models has been conducted in a simple

1. Find a characteristic that you think is associated with average returns. Sort stocks into
portfolios based on the characteristic, and check that there is a difference in average
returns between portfolios. Worry here about measurement, survival bias, ¬shing bias,
and all the other things that can ruin a pretty picture out of sample.
2. Compute betas for the portfolios, and check whether the average return spread is
accounted for by the spread in betas.
3. If not, you have an anomaly. Consider multiple betas.

This is the traditional procedure, but econometrics textbooks urge you not to group data
in this way. They urge you to use the characteristic as an instrument for the poorly measured
right hand variable instead. It is an interesting and unexplored idea whether this instrumental
variables approach could fruitfully bring us back to the examination of individual securities
rather than portfolios.
The CAPM proved stunningly successful in empirical work. Time after time, every strat-
egy or characteristic that seemed to give high average returns turned out to also have high
betas. Strategies that one might have thought gave high average returns (such as holding very
volatile stocks) turned out not to have high average returns when they did not have high betas.
To give some sense of that empirical work, Figure 44 presents a typical evaluation of
the Capital Asset Pricing Model. (Chapter 15 presented some of the methodological issues
surrounding this evaluation; here I focus on the facts.) I examine 10 portfolios of NYSE
stocks sorted by size (total market capitalization), along with a portfolio of corporate bonds
and long-term government bonds. As the spread along the vertical axis shows, there is a
sizeable spread in average returns between large stocks (lower average return) and small
stocks (higher average return), and also a large spread between stocks and bonds. The ¬gure
plots these average returns against market betas. You can see how the CAPM prediction
¬ts: portfolios with higher average returns have higher betas. In particular, notice that the
long term and corporate bonds have mean returns in line with their low betas, despite their
standard deviations nearly as high as those of stocks. Comparing this graph with the similar
Figure 5 of the consumption-based model back in Chapter 2, the CAPM ¬ts very well.


Figure 44. The CAPM. Average returns vs. betas on the NYSE value-weighted portfolio
for 10 size-sorted stock portfolios, government bonds, and corporate bonds, 1947-1996. The
solid line draws the CAPM prediction by ¬tting the market proxy and treasury bill rates
exactly (a time-series test). The dashed line draws the CAPM prediction by ¬tting an OLS
cross-sectional regression to the displayed data points. The small ¬rm portfolios are at the top
right. The points far down and to the left are the government bond and treasury bill returns.

In fact, Figure 44 captures one of the ¬rst signi¬cant failures of the CAPM. The smallest
¬rms (the far right portfolio) seem to earn an average return a few percent too high given their
betas. This is the celebrated “small-¬rm effect” (Banz 1981). Would that all failed economic
theories worked so well! It is also atypical in that the estimated market line through the stock
portfolios is steeper than predicted, while measurement error in betas usually means that the
estimated market line is too ¬‚at.

20.2.2 Fama-French 3 factors

Book to market sorted portfolios show a large variation in average returns that is unrelated
to market betas. The Fama and French 3 factor model successfully explains the average
returns of the 25 size and book to market sorted portfolios with a 3 factor model, consisting
of the market, a small minus big (SMB) portfolio and a high minus low (HML) portfolio.


In retrospect, it is surprising that the CAPM worked so well for so long. The assumptions
on which it is built are very stylized and simpli¬ed. Asset pricing theory recognized at least
since Merton (1971a,b) the theoretical possibility, indeed probability, that we should need
factors, state variables or sources of priced risk beyond movements in the market portfolio in
order to explain why some average returns are higher than others.
The Fama - French model is one of the most popular multi-factor models that now dom-
inate empirical research. Fama and French (1993) presents the model; Fama and French
(1996) gives an excellent summary, and also shows how the 3 factor model performs in eval-
uating expected return puzzles beyond the size and value effects that motivated it.
“Value” stocks have market values that are small relative to the accountant™s book value.
(Book values essentially track past investment expenditures.) This category of stocks has
given large average returns. “Growth” stocks are the opposite of value and have had low
average returns. Since low prices relative to dividends, earnings or book value forecast times
when the market return will be high, it is natural to suppose that these same signals forecast
categories of stocks that will do well; the “value effect” is the cross-sectional analogy to
price-ratio predictability in the time-series.
High average returns are consistent with the CAPM, if these categories of stocks have
high sensitivities to the market, high betas. However, small and especially value stocks seem
to have abnormally high returns even after accounting for market beta. Conversely “growth”
stocks seem to do systematically worse than their CAPM betas suggest. Figure 45 shows this
value-size puzzle. It is just like Figure 44, except that the stocks are sorted into portfolios
based on size and book-market ratio9 rather than size alone. As you can see, the highest port-
folios have three times the average excess return of the lowest portfolios, and this variation
has nothing at all to do with market betas.
Figures 46 and 47 dig a little deeper to diagnose the problem, by connecting portfolios
that have different size within the same book/market category, and different book/market
within size category. As you can see, variation in size produces a variation in average returns
that is positively related to variation in market betas, as we had in Figure 45. Variation in
book/market ratio produces a variation in average return is negatively related to market beta.
Because of this value effect, the CAPM is a disaster when confronted with these portfolios.
(Since the size effect disappeared in 1980, it is likely that almost the whole story can be told
with book/market effects alone.)
To explain these patterns in average returns, Fama and French advocate a multifactor
model with the market return, the return of small less big stocks (SMB) and the return of
high book/market minus low book/market stocks (HML) as three factors. They show that
variation in average returns of the 25 size and book/market portfolios can be explained by
varying loadings (betas) on the latter two factors. (All their portfolios have betas close to one
on the market portfolio. Thus, market beta explains the average return difference between

I thank Gene Fama for providing me with these data.


Figure 45. Average returns vs. market beta for 25 stock portfolios sorted on the basis of
size and book/market ratio.

stocks and bonds, but not across categories of stocks.)
Figures 48 and 49 illustrate Fama and French™s results. The vertical axis is still the average
return of the 25 size and book/market portfolios. Now, the horizontal axis is the predicted
values from the Fama-French three factor model. The points should all lie on a 45—¦ line if
the model is correct. The points lie much closer to this prediction than they do in Figures 46
and 47. The worst ¬t is for the growth stocks (lowest line, left hand panel), for which there is
little variation in average return despite large variation in size beta as one moves from small
to large ¬rms.

20.2.3 What are the size and value factors?

What are the macroeconomic risks for which the Fama-French factors are proxies or
mimicking portfolios? There are hints of some sort of “distress” or “recession” factor at
A central part of the Fama French model is the fact that these three pricing factors also
explain a large part of the ex-post variation in the 25 portfolios “ the R2 in time-series regres-
sions are very high. In this sense, one can regard it as an APT rather than a macroeconomic


Figure 46. Average excess returns vs. market beta. Lines connect portfolios with different
size category within book to market categories.

factor model.
The Fama-French model is not a tautology, despite the fact that factors and test portfolios
are based on the same set of characteristics.

We would like to understand the real, macroeconomic, aggregate, nondiversi¬able risk
that is proxied by the returns of the HML and SMB portfolios. Why are investors so con-
cerned about holding stocks that do badly at the times that the HML (value less growth) and
SMB (small-cap less large-cap) portfolios do badly, even though the market does not fall?
Fama and French (1995) note that the typical “value” ¬rm has a price that has been driven
down from a long string of bad news, and is now in or near ¬nancial distress. Stocks bought
on the verge of bankruptcy have come back more often than not, which generates the high
average returns of this strategy. This observation suggests a natural interpretation of the value
premium: If a credit crunch, liquidity crunch, ¬‚ight to quality or similar ¬nancial event comes
along, stocks in ¬nancial distress will do very badly, and this is just the sort of time at which
one particularly does not want to hear that one™s stocks have become worthless! (One cannot
count the “distress” of the individual ¬rm as a “risk factor.” Such distress is idiosyncratic and
can be diversi¬ed away. Only aggregate events that average investors care about can result
in a risk premium.) Unfortunately, empirical support for this theory is weak, since the HML


Figure 47. Average excess returns vs. market beta. Lines connect portfolios with different
book to market categories within size categories.

portfolio does not covary strongly with other measures of aggregate ¬nancial distress. Still,
it is a possible and not totally tested interpretation, since we have so few events of actual
systematic ¬nancial stress in recent history.
Heaton and Lucas™ (1997) results add to this story for the value effect. They note that the
typical stockholder is the proprietor of a small, privately held business. Such an investor™s
income is of course particularly sensitive to the kinds of ¬nancial events that cause distress
among small ¬rms and distressed value ¬rms. Such an investor would therefore demand
a substantial premium to hold value stocks, and would hold growth stocks despite a low
Lettau and Ludvigson (2000) (also discussed in the next section) document that HML has
a time-varying beta on both the market return and on consumption. Thus, though there is
(unfortunately) very little unconditional correlation between HML and recession measures,
Lettau and Ludvigson document that HML is sensitive to bad news in bad times.
Liew and Vassalou (1999) are an example of current attempts to link value and small ¬rm
returns to macroeconomic events. They ¬nd that in many countries counterparts to HML and
SMB contain information above and beyond that in the market return for forecasting GDP
growth. For example, they report a regression

GDPt’t+1 = a + 0.065 M KTt’1’t + 0.058 HM Lt’1’t + µt+1


Figure 48. Average excess return vs. prediction of the Fama-French 3 factor model. Lines
connect portfolios of different size categories within book to market category.

GDPt’t+1 denotes the next year™s GDP growth and MKT, HML denote the previous
year™s return on the market index and HML portfolio. Thus, a 10% HML return re¬‚ects
a 1/2 percentage point rise in the GDP forecast.
On the other hand, one can ignore Fama and French™s motivation and regard the model
as an arbitrage pricing theory. If the returns of the 25 size and book/market portfolios could
be perfectly replicated by the returns of the 3 factor portfolios “ if the R2 in the time-series
regressions were 100% “ then the multifactor model would have to hold exactly, in order to
preclude arbitrage opportunities. In fact the R2 of Fama and French™s time-series regressions
are all in the 90%-95% range, so extremely high Sharpe ratios for the residuals (which are
portfolios) would have to be invoked for the model not to ¬t well. Equivalently, given the
average returns from HML and SMB and the failure of the CAPM to explain those returns,
there would be near-arbitrage opportunities if value and small stocks did not move together
in the way described by the Fama-French model.
One way to assess whether the three factors proxy for real macroeconomic risks is by
checking whether the multifactor model prices additional portfolios, and especially portfo-
lios that do not have high R2 values. Fama and French (1996) extend their analysis in this
direction: They ¬nd that the SMB and HML portfolios comfortably explain strategies based
on alternative price multiples (P/E, B/M), strategies based on 5 year sales growth (this is es-
pecially interesting since it is the only strategy that does not form portfolios based on price
variables) and the tendency of 5 year returns to reverse. All of these strategies are not ex-


Figure 49. Average excess returns vs. predictions of the Fama-French 3 factor model.
Lines connect portfolios of different book to market category within the same size category.

plained by CAPM betas. However they all also produce portfolios with high R2 values in a
time-series regression on the HML and SMB portfolios! This is good and bad news. It might
mean that the model is a good APT; that the size and book/market characteristics describe the
major sources of priced variation in all stocks. On the other hand it might mean that these ex-
tra sorts just haven™t identi¬ed other sources of priced variation in stock returns. (Fama and
French also ¬nd that HML and SMB do not explain “momentum,” despite large R2 values.
More on momentum later.)
One™s ¬rst reaction may be that explaining portfolios sorted on the basis of size and book
to market by factors sorted on the same basis is a tautology. This is not the case. For exam-
ple, suppose that average returns were higher for stocks whose ticker symbols start later in
the alphabet. (Maybe investors search for stocks alphabetically, so the later stocks are “over-
looked.”) This need not trouble us if Z stocks happened to have higher betas. If not “ if letter
of the alphabet were a CAPM anomaly like book to market “ however, it would not necessar-
ily follow that letter based stock portfolios move together. Adding A-L and M-Z portfolios
to the right hand side of a regression of the 26 A,B,C, etc. portfolios on the market portfolio
need not (and probably does not) increase the R2 at all. The size and book to market pre-
mia are hard to measure, and seem to have declined substantially in recent years. But even
if they decline back to CAPM values, Fama and French will still have found a surprisingly
large source of common movement in stock returns.
More to the point, in testing a model It is exactly the right thing to do to sort stocks


into portfolios based on characteristics related to expected returns. When Black Jensen and
Scholes and Fama and MacBeth ¬rst tested the CAPM, they sorted stocks into portfolios
based on betas, because betas are a good characteristic for sorting stocks into portfolios that
have a spread in average returns. If your portfolios have no spread in average returns “ if you
just choose 25 random portfolios “ then there will be nothing for the asset pricing model to
In fact, despite the popularity of the Fama French 25, there is really no fundamental reason
to sort portfolios based on 2 way or larger sorts of individual characteristics. You should use
all the characteristics at hand that (believably!) indicate high or low average returns and
simply sort stocks according to a one-dimensional measure of expected returns.
The argument over the status of size and book/market factors continues, but the important
point is that it does so. Faced with the spectacular failure of the CAPM documented in Figures
and 4, one might have thought that any hope for a rational asset pricing theory was over. Now
we are back where we were, examining small anomalies and arguing over re¬nements and
interpretations of the theory. That is quite an accomplishment!

20.2.4 Macroeconomic factors

Labor income, industrial production, news variables and conditional asset pricing models
have also all had some successes as multifactor models.

I have focused on the size and value factors since they provide the most empirically suc-
cessful multifactor model to date, and have therefore attracted much attention.
Several authors have used macroeconomic variables as factors in order to examine di-
rectly the story that stock performance during bad macroeconomic times determines average
returns. Jagannathan and Wang (1996) and Reyfman (1997) use labor income; Chen Roll
and Ross (1986) use industrial production and in¬‚ation among other variables. Cochrane
(1996) uses investment growth. All these authors ¬nd that average returns line up against
betas calculated using these macroeconomic indicators. The factors are theoretically easier
to motivate, but none explains the value and size portfolios as well as the (theoretically less
solid, so far) size and value factors.
Lettau and Ludvigson (2000) specify a macroeconomic model that does just as well as
the Fama-French factors in explaining the 25 Fama-French portfolios. Their plots of actual
average returns vs. model predictions show a relation as strong as those of Figures 48 and
49. Their model is

mt+1 = a + b(cawt )∆ct+1

where caw is a measure of the consumption-wealth ratio. This is a “scaled factor model” of


the sort advocated in Chapter 8. You can think of it as capturing a time-varying risk aversion.
This is a stunning result.
Though Merton™s (1971a,b) theory says that variables which predict market returns should
show up as factors which explain cross-sectional variation in average returns, surprisingly few
papers have actually tried to see whether this is true, now that we do have variables that we
think forecast the market return. Campbell (1996) and Ferson and Harvey (1999) are among
the few exceptions.

20.2.5 Momentum and reversal

Sorting stocks based on past performance, you ¬nd that a portfolio that buys long-term
losers and sells long-term winners does better than the opposite “ individual stock long-term
returns mean-revert. This “reversal” effect makes sense given return predictability and mean-
reversion, and is explained by the Fama-French 3 factor model. However, a portfolio that
buys short-term winners and sells short-term losers also does well “ “momentum.” This
effect is a puzzle.

Since a string of good returns gives a high price, it is not surprising that stocks that do
well for a long time (and hence build up a high price) subsequently do poorly, and stocks that
do poorly for a long time (and hence dwindle down to a low price, market value, or market
to book ratio) subsequently do well Table 3, taken from Fama and French (1996) reveals that
this is in fact the case. (As usual, this table is the tip of an iceberg of research on these effects,
starting with DeBont and Thaler 1985 and Jagadeesh and Titman 1993.)

Portfolio Average
Formation Return, 10-1
Strategy Period Months (Monthly %)
Reversal 6307-9312 60-13 -0.74
Momentum 6307-9312 12-2 +1.31
Reversal 3101-6302 60-13 -1.61
Momentum 3101-6302 12-2 +0.38

Table 3. Average monthly returns from reversal and momentum strategies. Each
month, allocate all NYSE ¬rms on CRSP to 10 portfolios based on their perfor-
mance during the “portfolio formation months” interval. For example, 60-13 forms
portfolios based on returns from 5 years ago to 1 year, 1 month ago. Then buy
the best-performing decile portfolio and short the worst-performing decile portfo-
lio. Source: Fama and French (1996) Table VI.



Here is the “reversal” strategy. Each month, allocate all stocks to 10 portfolios based
on performance in year -5 to year -1. Then, buy the best-performing portfolio and short the
worst-performing portfolio. The ¬rst row of Table 3 shows that this strategy earns a hefty
-0.74% monthly return10 . Past long-term losers come back and past winners do badly. This
is a cross-sectional counterpart to the mean-reversion that we studied in section 1.4. Fama
and French (1998a) already found substantial mean-reversion “ negative long-horizon return
autocorrelations “ in disaggregated stock portfolios, so one would expect this phenomenon.
Spreads in average returns should correspond to spreads in betas. Fama and French verify
that these portfolio returns are explained by their 3 factor model. Past losers have a high
HML beta; they move together with value stocks, and so inherit the value stock premium.
The second row of Table 3 tracks the average monthly return from a “momentum” strat-
egy. Each month, allocate all stocks to 10 portfolios based on performance in the last year.
Now, quite surprisingly, the winners continue to win, and the losers continue to lose, so that
buying the winners and shorting the losers generates a positive 1.31% monthly return.
At every moment there is a most-studied anomaly, and momentum is that anomaly as I
write. It is not explained by the Fama French 3 factor model. The past losers have low prices
and tend to move with value stocks. Hence the model predicts they should have high average
returns, not low average returns. Momentum stocks move together, as do value and small
stocks so a “momentum factor” works to “explain” momentum portfolio returns. This is so
obviously ad-hoc (i.e. an APT factor that will only explain returns of portfolios organized on
the same characteristic as the factor) that nobody wants to add it as a risk factor.
A momentum factor is more palatable as a performance attribution factor “ to say that a
fund did well by following a momentum strategy rather than by stock picking ability, leaving
aside why a momentum strategy should work. Carhart (1997) uses it in this way to show that
similar momentum behavior in fund returns is due to momentum in the underlying stocks
rather than persistent stock-picking skill.
Momentum may be explained as just a new way of looking at an old phenomenon, the
small apparent predictability of monthly individual stock returns. A tiny regression R2 for
forecasting monthly returns of 0.0025 (1/4%) is more than adequate to generate the momen-
tum results of Table 3. The key is the large standard deviation of individual stock returns,
typically 40% or more at an annual basis. The average return of the best performing decile
of a normal distribution is 1.76 standard deviations above the mean11 , so the winning mo-

Fama and French do not provide direct measures of standard deviations for these portfolios. One can infer

however from the betas, R2 values and standard deviation of market and factor portfolios that the standard
deviations are roughly 1-2 times that of the market return, so that Sharpe ratios of these strategies are comparable to
that of the market return.
11 We™re looking for

x rf (r)dr
E(r|r ≥ x) = R ∞
x f (r)dr


mentum portfolio typically went up about 80% in the previous year, and the typical losing
portfolio went down about 60% per year. Only a small amount of continuation will give a 1%
monthly return when multiplied by such large√ returns. To be precise, the monthly indi-
vidual stock standard deviation is about 40%/ 12 ≈ 12% . If the R2 is 0.0025, the standard

deviation of the predictable part of returns is 0.0025 — 12% = 0.6%. Hence, the decile pre-
dicted to perform best will earn 1.76 — 0.6% ≈ 1% above the mean. Since the strategy buys
the winners and shorts the losers, an R2 of 0.0025 implies that one should earn a 2% monthly
return by the momentum strategy “ more even than the 1.3% shown in Table 3. Lewellen
(2000) offers a related explanation for momentum coming from small cross-correlations of
We have known at least since Fama (1965) that monthly and higher frequency stock re-
turns have slight, statistically signi¬cant predictability with R2 in the 0.01 range. However,
such small though statistically signi¬cant high frequency predictability, especially in small
stock returns, has also since the 1960s always failed to yield exploitable pro¬ts after one
accounts for transactions costs, thin trading, high short sale costs and other microstructure is-
sues. Hence, one naturally worries whether momentum is really exploitable after transactions
Momentum does require frequent trading. The portfolios in Table 3 are reformed every
month. Annual winners and losers will not change that often, but the winning and losing
portfolio must still be turned over at least once per year. Carhart (1996) calculates transac-
tions costs and concludes that momentum is not exploitable after those costs are taken into
account. Moskowitz and Grinblatt (1999) note that most of the apparent gains come from
short positions in small, illiquid stocks, positions that also have high transactions costs. They
also ¬nd that a large part of momentum pro¬ts come from short positions taken November,
anticipating tax-loss selling in December. This sounds a lot more like a small microstructure
glitch rather than a central parable for risk and return in asset markets.
Table 3 already shows that the momentum effect essentially disappears in the earlier data
sample, while reversal is even stronger in that sample. Ahn, Boudoukh, Richardson, and
Whitelaw (1999) show that apparent momentum in international index returns is missing
from the futures markets, also suggesting a microstructure explanation.
Of course, it is possible that a small positive autocorrelation is there and related to some
risk. However, it is hard to generate real positive autocorrelation in realized returns. As
we saw extensively in section 20.335, a slow and persistent variation in expected returns
most naturally generates negative autocorrelation in realized returns. News that expected
returns are higher means future dividends are discounted at a higher rate, so today™s price and
return declines. The only way to overturn this prediction is to suppose that expected return

where x is de¬ned as the top 10th cutoff,
Z ∞ 1
f (r)dr = .

With a normal distribution, x = 1.2816σ and E(r|r ≥ x) = 1.755σ.


shocks are positively correlated with shocks to current or expected future dividend growth.
A convincing story for such correlation has not yet been constructed. On the other hand, the
required positive correlation is very small and not very persistent.

20.3 Summary and interpretation

While the list of new facts appears long, similar patters show up in every case. Prices reveal
slow-moving market expectations of subsequent excess returns, because potential offsetting
events seem sluggish or absent. The patterns suggest that there are substantial expected return
premia for taking on risks of recession and ¬nancial stress unrelated to the market return.
Magnifying glasses
The effects are not completely new. We knew since the 1960s that high frequency returns
are slightly predictable, with R2 of 0.01 to 0.1 in daily to monthly returns. These effects were
dismissed because there didn™t seem to be much that one could do about them. A 51/49 bet
is not very attractive, especially if there is any transactions cost. Also, the increased Sharpe
ratio one can obtain by exploiting predictability is directly related to the forecast R2 , so tiny
R2 , even if exploitable, did not seem like an important phenomenon.
Many of the new facts amount to clever magnifying glasses, ways of making small facts
economically interesting. For forecasting market returns, we now realize that R2 rise with
horizon when the forecasting variables are slow-moving. Hence small R2 at high frequency
can mean really substantial R2 , in the 30-50% range, at longer horizons. Equivalently, we re-
alize that small expected return variation can add up to striking price variation if the expected
return variation is persistent. For momentum effects, the ability to sort stocks and funds into
momentum-based portfolios means that incredibly small predictability times portfolios with
huge past returns gives important subsequent returns.
Dogs that did not bark
In each case, an apparent difference in yield should give rise to an offsetting movement,
but seems not to do so. Something should be predictable so that returns are not predictable,
and it isn™t.
The d/p forecasts of the market return were driven by the fact that dividends should be
predictable, so that returns are not. Instead, dividend growth seems nearly unpredictable. As
we saw, this fact and the speed of the d/p mean reversion imply the observed magnitude of
return predictability.
The term structure forecasts of bond returns were driven by the fact that bond yields
should be predictable, so that returns are not. Instead, yields seem nearly unpredictable at the
one year horizon. This fact means that the forward rate moves one for one with expected
returns, and that a one percentage point increase in yield spread signals as much as a 5
percentage point increase in expected return.
Exchange rates should be forecastable so that foreign exchange returns are not. Instead,


a one percentage point increase in interest rate abroad seems to signal a greater than one
percentage point increase in expected return.
Prices reveal expected returns
If expected returns rise, prices are driven down, since future dividends or other cash ¬‚ows
are discounted at a higher rate. A “low” price, then, can reveal a market expectation of a high
expected or required return.
Most of our results come from this effect. Low price/dividend, price/earnings, price/book
values signal times when the market as a whole will have high average returns. Low market
value (price times shares) relative to book value signals securities or portfolios that earn high
average returns. The “small ¬rm” effect derives from low prices “ other measures of size
such as number of employees or book value alone have no predictive power for returns (Berk
1997). The “5 year reversal” effect derives from the fact that 5 years of poor returns lead to
a low price. A high long-term bond yield means that the price of long term bonds is “low,”
and this seems to signal a time of good long-term bonds returns. A high foreign interest rate
means a low price on foreign bonds, and this seems to indicate good returns on the foreign
The most natural intepretatation of all these effects is that the expected or required return
“ the risk premium “ on individual securities as well as the market as a whole varies slowly
over time. Thus we can track market expectations of returns by watching price/dividend,
price/earnings or book/market ratios.
Macroeconomic risks
The price-based patterns in time-series and cross-sectional expected returns suggest a pre-
mium for holding risks related to recession and economy-wide ¬nancial distress. All of the
forecasting variables are connected to macroeconomic activity (Fama and French 1989). The
dividend price ratio is highly correlated with the default spread and rises in bad times. The
term spread forecasts bond and stock returns, and is also one of the best recession forecasters.
It rises steeply at the bottoms of recessions, and is inverted at the top of a boom. Thus, return
forecasts are high at the bottom of business cycles and low at the top of booms. “Value” and
“small-cap” stocks are typically distressed. Formal quantitative and empirically successful
economic models of the recession and distress premia are still in their infancy (I think Camp-
bell and Cochrane 1999 is a good start), but the story is at least plausible, and the effects have
been expected by theorists for a generation.
To make this point come to life, think concretely about what you have to do to take
advantage of the value or predictability strategies. You have to buy stocks or long-term bonds
at the bottom, when stock prices are low after a long and depressing bear market; in the
bottom of a recession or ¬nancial panic; a time when long-term bond prices and corporate
bond prices are unusually low. This is a time when few people have the guts (the risk-
tolerance) or the wallet to buy risky stocks or risky long-term bonds. Looking across stocks
rather than over time, you have to invest in “value” or small market capitalization companies,
dogs by any standards. These are companies with years of poor past returns, years of poor


sales, companies on the edge of bankruptcy, far off of any list of popular stocks to buy.
Then, you have to sell stocks and long term bonds in good times, when stock prices are high
relative to dividends, earnings and other multiples, when the yield curve is ¬‚at or inverted so
that long term bond prices are high. You have to sell the popular “growth” stocks with good
past returns, good sales and earnings growth.
I™m going on a bit here to counter the widespread impression, best crystallized by Shiller
(2000) that high price earnings ratios must signal “irrational exuberance.” Perhaps, but is
it just a coincidence that this exuberance comes at the top of an unprecedented economic
expansion, a time when the average investor is surely feeling less risk averse than ever, and
willing to hold stocks despite historically low risk premia? I don™t know the answer, but
the rational explanation is surely not totally impossible! Is it just a coincidence that we are
¬nding premia just where a generation of theorists said we ought to “ in recessions, credit
crunches, bad labor markets, investment opportunity set variables, and so forth?
This line of explanation for the foreign exchange puzzle is still a bit farther off, though
there are recent attempts to make economic sense of the puzzle (See Engel™s 1996 survey;
Atkeson, Alvarez and Kehoe 1999 is a recent example.) At a verbal level, the strategy leads
you to invest in countries with high interest rates. High interest rates are often a sign of
monetary instability or other economic trouble, and thus may mean that the investments are be
more exposed to the risks of global ¬nancial stress or a global recession than are investments
in the bonds of countries with low interest rates, who are typically enjoying better times.
Overall, the new view of ¬nance amounts to a profound change. We have to get used to
the fact that most returns and price variation come from variation in risk premia, not variation
in expected cash ¬‚ows, interest rates, etc. Most interesting variation in priced risk comes from
non-market factors. These are easy to say, but profoundly change our view of the world.
Momentum is, so far, unlike all the other results. The underlying phenomenon is a small
predictability of high frequency returns. However, the price-based phenomena make this pre-
dictability important by noting that, with a slow-moving forecasting variable, the R2 build
over horizon. Momentum is based on a fast-moving forecast variable “ the last year™s return.
Therefore the R2 decline with horizon. Instead, momentum makes the tiny autocorrelation
of high frequency returns signi¬cant by forming portfolios of extreme winners and losers,
so a small continuation of huge past returns gives a large current return. All the other re-
sults are easily digestible as a slow, business-cycle related time-varying expected return. This
speci¬cation gives negative autocorrelation (unless we add a distasteful positive correlation
of expected return and dividend shocks) and so does not explain momentum. Momentum
returns have also not yet been linked to business cycles or ¬nancial distress in even the infor-
mal way that I suggested for the price-based strategies. Thus, it still lacks much of a plausible
economic interpretation. To me, this adds weight to the view that it isn™t there, it isn™t ex-
ploitable, or it represents a small illiquidity (tax-loss selling of small illiquid stocks) that will
be quickly remedied once a few traders understand it. In the entire history of ¬nance there
has always been an anomaly-du-jour, and momentum is it right now. We will have to wait to


see how it is resolved.
Many of the anomalous risk premia seem to be declining over time. The small ¬rm effect
completely disappeared in 1980; you can date this as the publication of the ¬rst small ¬rm
effect papers or the founding of small ¬rm mutual funds that made diversi¬ed portfolios of
small stocks available to average investors. To emphasize this point, Figure 50 plots size
portfolio average returns vs. beta in the period since 1979. You can see that not only has the
small ¬rm premium disappeared, the size-related variation in beta and expected return has

Figure 50. Average returns vs. market betas. CRSP size portfolios less treasury bill rate,
monthly data 1979-1998.

The value premium has been cut roughly in half in the 1990s, and 1990 is roughly the date

of widespread popularization of the value effect, though σ/ T leaves a lot of room for error
here. As you saw in Table RR, the last 5 years of high market returns have cut the estimated
return predictability from the dividend-price ratio in half.
These facts suggest an uncomfortable implication: that at least some of the premium the
new strategies yielded in the past was due to the fact that they were simply overlooked or are
artifacts of data-dredging.
Since they are hard to measure, one is tempted to put less emphasis on these average


returns. However, they are crucial to our interpretation of the facts. The CAPM is perfectly
consistent with the fact that there are additional sources of common variation. For example,
it was long understood that stocks in the same industry move together; the fact that value or
small stocks also move together need not cause a ripple. The surprise is that investors seem to
earn an average return premium for holding these additional sources of common movement,
whereas the CAPM predicts that (given beta) they should have no effect on a portfolio™s
average returns.

20.4 Problems

1. Does equation (20.308) condition down to information sets coarser than that observed by
agents? Or must we assume that whatever VAR is used by the econometrician contains
all information seen by agents?
2. Show that the two regressions in Table 5 are complementary “ that the coef¬cients add
up to one, mechanically, in sample.
3. Derive the return innovation decomposition (20.319), directly. Write the return
rt = ∆dt + ρ (pt ’ dt ) ’ (pt’1 ’ dt’1 )
Apply Et ’ Et’1 to both sides,
rt ’ Et’1 rt = (Et ’ Et’1 ) ∆dt + ρ (Et ’ Et’1 ) (pt ’ dt ) .
Use the price-dividend identity and iterate forward to obtain (20.308).
4. Find the univariate representation and mean-reversion statistics for prices implied by the
simple VAR and the three dividend examples.
5. Find the univariate return representation from a general return forecasting VAR.
rt+1 = axt + µrt+1
xt+1 = bxt + µxt+1
Find the correlation between return and x shocks necessary to generate uncorrelated
6. Show that stationary xt ’ yt , ∆xt , ∆yt imply that xt and yt must have the same variance
ratio and long-run differences must become perfectly correlated. Start by showing that
the long run variance limk’∞ var(xt+k ’ xt )/k for any stationary variable must be
zero. Apply that fact to xt ’ yt .
7. Compute the long-horizon regression coef¬cients and R2 in the VAR (20.311)-(20.317).
Show that the R2 do indeed rise with horizon. Do coef¬cients and R2 rise forever, or do
they turn around at some point?

Chapter 21. Equity premium puzzle and
consumption-based models
The original speci¬cation of the consumption-based model was not a great success, as we
saw in Chapter 1. Still, it is in some sense the only model we have. The central task of
¬nancial economics is to ¬gure out what are the real risks that drive asset prices and expected
returns. Something like the consumption-based model “ investors™ ¬rst order conditions for
savings and portfolio choice “ has to be the starting point.
Rather than dream up models, test them and reject them, ¬nancial economists since the
work of Mehra and Prescott (1986) and Hansen and Jagannathan (1991) have been able to
work backwards to some extent, characterizing the properties that discount factors must have
in order to explain asset return data. Among other things, we learned that the discount factor
had to be extremely volatile, while not too conditionally volatile; the riskfree rate or condi-
tional mean had to be pretty steady. This knowledge is now leading to a much more successful
set of variations on the consumption-based model.

21.1 Equity premium puzzles

21.1.1 The basic equity premium/riskfree rate puzzle

The postwar US market Sharpe ratio is about 0.5 “ an 8% return and 16% standard devi-
ation. The basic Hansen-Jagannathan bound

E(Re ) σ(m)
¤ ≈ γσ(∆c)
σ(R E(m)

implies σ(m) ≥ 50% on an annual basis, requiring huge risk aversion or consumption growth
The average risk free rate is about 1%, so E(m) ≈ 0.99. High risk aversion with power
utility implies a very high riskfree rate, or requires a negative subjective discount factor.
Interest rates are quite stable over time and across countries, so Et (m) varies little. High
risk aversion with power utility implies that interest rates are very volatile.

In Chapter 1, we derived the basic Hansen-Jagannathan (1991) bounds. These are char-
acterizations of the discount factors that price a given set of asset returns. Manipulating


0 = E(mRe ) we found
|E(Re )|
≥ .
σ(Re )

In continuous time, or as an approximation in discrete time, we found that time-separable
utility implies
|E(Re )|
γσ(∆c) ≥
σ(Re )
where γ = ’cu00 /u0 is the local curvature of the utility function, and risk aversion coef¬cient
for the power case.
Equity premium puzzle
The postwar mean value weighted NYSE is about 8% per year over the T-bill rate, with a
standard deviation of about 16%. Thus, the market Sharpe ratio E(Re )/σ(Re ) is about 0.5
for an annual investment horizon. If there were a constant risk free rate,

E(m) = 1/Rf

would nail down E(m). The T-bill rate is not very risky, so E(m) is not far from the inverse
of the mean T-bill rate, or about E(m) ≈ 0.99. Thus, these basic facts about the mean and
variance of stocks and bonds imply σ(m) > 0.5. The volatility of the discount factor must
be about 50% of its level in annual data!
Per capita consumption growth has standard deviation about 1% per year. With log utility,
that implies σ(m) = 0.01 = 1% which off by a factor of 50. To match the equity premium
we need γ > 50,which seems a huge level of risk aversion. Equivalently, a log utility investor
with consumption growth of 1% and facing a 0.5 Sharpe ratio should be investing dramati-
cally more in the stock market, borrowing to do so. He should invest so much that his wealth
and hence consumption growth does vary by 50% each year.
Correlation puzzle
The bound takes the extreme possibility that consumption and stock returns are perfectly
correlated. They are not, in the data. Correlations are hard to measure, since they are sensitive
to data de¬nition, timing, time-aggregation, and so forth. Still, the correlation of annual stock
returns and nondurable plus services consumption growth in postwar U.S. data is no more
than about 0.2. If we use this information as well “ if we characterize the mean and standard
deviation of all discount factors that have correlation less than 0.2 with the market return “
the calculation becomes
|E(Re )|
σ(m) 1 1
¯ ¯
≥¯ = 0.5 = 2.5
ρm,Re ¯ σ(Re )
E(m) 0.2

with σ(m) ≈ γσ(∆c), we now need a risk aversion coef¬cient of 250!


Here is a classier way to state the correlation puzzle. Remember that proj(m|X) should
price assets just as well as m itself. Now, m = proj(m|X)+µ and σ 2 (m) = σ2 (proj (m|X))+
σ2 (µ). Some of the early resolutions of the equity premium puzzle ended up adding noise
uncorrelated with asset payoffs to the discount factor. This modi¬cation increased discount
factor volatility and satis¬ed the bound. But as you can see, adding µ increases σ 2 (m) with
no effect whatsoever on the model™s ability to price assets. As you add µ, the correlation be-
tween m and asset returns declines. A bound with correlation, or equivalently comparing
σ2 (proj(m|X)) rather than σ 2 (m) to the bound avoids this trap.
Average interest rates and subjective discount factors
It has been traditional to use risk aversion numbers of 1 to 5 or so, but perhaps this is
tradition, not fact. What™s wrong with γ = 50 to 250?
The most basic piece of evidence for low γ comes from the relation between consumption
growth and interest rates.
"µ ¶’γ #
Rf = Et (mt+1 ) = Et β

or, in continuous time,
rt = δ + γEt (∆c) .

We can take unconditional expectations to compare these equations with average interest
rates and consumption growth.
Average real interest rates are also about 1% Thus, γ = 50 to 250 with a typical δ such as
δ = 0.01 implies a very high riskfree rate, of 50 ’ 250%. To get a reasonable interest rate, we
have to use a subjective discount factor δ = ’0.5 to ’2.5, or ’50% to ’250%. That™s not
impossible “ present values can converge with negative discount rates (Kocherlakota 1990) “
but it does not seem reasonable. People prefer earlier consumption, not later consumption.
Interest rate variation and the conditional mean of the discount factor
Again, however, maybe we™re being too doctrinaire. What evidence is there against γ =
50 ’ 250 with corresponding δ = ’0.5 to ’2.5?
Real interest rates are not only low on average, they are also relatively stable over time
and across countries. γ = 50 in equation (21.343) means that a country or a boom time with
consumption growth 1 percentage point higher than normal must have real interest rates 50
percentage points higher than normal, and consumption 1 percentage point lower than normal
should be accompanied by real interest rates of 50 percentage points lower than normal“ you
pay them 48% to keep your money. We don™t see anything like this.
γ = 50 to 250 in a time-separable utility function implies that consumers are essentially
unwilling to substitute (expected) consumption over time, so huge interest rate variation must
force them to make the small variations in consumption growth that we do see. This level
of aversion to intertemporal substitution is too large. For example, think about what interest


rate you need to convince someone to skip a vacation. Take a family with $50,000 per year
consumption, and which spends $2,500 (5%) on an annual vacation. If interest rates are
good enough, though, the family can be persuaded to skip this year™s vacation and go on a
much more lavish vacation next year. The required interest rate is ($52, 500/$47, 500)γ ’ 1.
For γ = 250 that is an interest rate of 3 — 1011 ! For γ = 50, we still need an interest
rate of 14, 800%. I think most of us would give in and defer the vacation for somewhat
lower interest rates! A reasonable willingness to substitute intertemporally is central to most
macroeconomic models that try to capture output, investment, consumption, etc. dynamics.
As always, we can express the observation as a desired characteristic of the discount
factor. Though mt+1 must vary a lot, its conditional mean Et (mt+1 ) = 1/Rt must not vary
much. You can get variance in two ways “ variance in the conditional mean and variance in
the unexpected component; var(x) = var [Et (x)] + var [x ’ Et (x)]. The fact that interest
rates are stable means that almost all of the 50% or more unconditional variance must come
from the second term.
The power functional form is really not an issue. To get past the equity premium and these
related puzzles, we will have to introduce other arguments to the marginal utility function “
some non-separability. One important key will be to introduce some non-separability that
distinguishes intertemporal substitution from risk aversion.

21.1.2 Variations

Just raising the interest rate will not help, as all-stock portfolios have high Sharpe ratios
Uninsured individual risk is not an obvious solution. Individual consumption is not
volatile enough to satisfy the bounds, and is less correlated with stock returns than aggre-
gate consumption.
The average return in postwar data may overstate the true expected return; a target of
3-4% is not unreasonable.

Is the interest rate “too low”?
A large literature has tried to explain the equity premium puzzle by introducing frictions
that make treasury bills “money-like” and so argue that the short-term interest rate is arti-
¬cially low. (Aiyagari and Gertler 1991 is an example). However, high Sharpe ratios are
pervasive in ¬nancial markets. Portfolios long small stocks and short big stocks, or long
value (high book/market) and short growth stocks, give Sharpe ratios of 0.5 or more as well.
Individual shocks
Maybe we should abandon the representative agent assumption. Individual income shocks


are not perfectly insured, so individual income and consumption is much more volatile than
aggregate consumption. Furthermore, through most of the sample, only a small portion of
the population held any stocks at all.
This line of argument faces a steep uphill battle. The basic pricing equation applies to
each consumer. Individual income growth may be more volatile than the aggregate, but it™s
not credible that any individual™s consumption growth varies by 50% -250% per year! Keep
in mind, this is nondurable and services consumption and the ¬‚ow of services from durables,
not durables purchases.
Furthermore, individual consumption growth is likely to be less correlated with stock
returns than is aggregate consumption growth, and the more volatile it is, the less correlated.
As a simple example, write individual consumption equal to aggregate consumption plus an
idiosyncratic shock, uncorrelated with economywide variables,

∆ci = ∆ca + µi .
t t t

¡ ¢
cov(∆ci , rt ) = cov ∆ca + µi , rt = cov (∆ca , rt ) .
t t t t

As we add more idiosyncratic variation, the correlation of consumption with the any aggre-
gate such as stock returns declines in exact proportion so that the asset pricing implications
are completely unaffected.
Luck and a lower target
One nagging doubt is that a large part of the U.S. postwar average stock return may
represent good luck rather than ex-ante expected return.
First of all, the standard deviation of stock returns is so high that standard errors are

surprisingly large. Using the standard formula σ/ T , the standard error of average stock

returns in 50 years of data is about 16/ 50 ≈ 2.3. This fact means that a two-standard error
con¬dence interval for the expected return extends from about 3% to about 13%!
This is a pervasive, simple, but√surprisingly under-appreciated problem in empirical asset
pricing. In 20 years of data, 16/ 20 = 3.6 so we can barely say that an 8% average re-
turn is above zero. 5 year performance averages of something like a stock return are close to

meaningless on a statistical basis, since 16/ 5 = 7. 2. (This is one reason that many funds
are held to tracking error limits relative to a benchmark. You may be able to measure perfor-
mance relative to a benchmark, even if your return and the benchmark are both very volatile.

If σ(Ri ’ Rm ) is small, then σ(Ri ’ Rm )/ T can be small, even if σ(Ri ) and σ(Rm ) are
However, large standard errors can argue that the equity premium is really higher than
the postwar return. Several other arguments suggest a bias “ that a substantial part of the 8%
average excess return of the last 50 years was good luck, and that the true equity premium is
more like 3-4%.


Brown, Goetzmann and Ross (1995) suggest that the U.S. data suffer from selection bias.
One of the reasons that I write this book in the U.S., and that the data has been collected from
the U.S., is precisely because U.S. stock returns and growth have been so good for the last 50
- 100 years.
One way to address this question is to look at other samples. Average returns were a lot
lower in the U.S. before WWII. In Shiller™s (1989) annual data from 1871-1940, the S&P500
average excess return was only 4.1% However, Campbell (1999, table 1) looks across coun-
tries for which we have stock market data from 1970-1995, and ¬nds the average equity
premium practically the same as that for the U.S. in that period. The other countries averaged
a 4.6% excess return while the U.S. had a 4.4% average excess return in that period.
On the other hand, Campbell™s countries are Canada, Japan, Australia and Western Eu-
rope. These probably shared a lot of the U.S. “good luck” in the postwar period. There are
lots of countries for which we don™t have data, and usually because returns were very low in
those countries. As Brown, Goetzmann and Ross (1995) put it, “Looking back over the his-
tory of the London or the New York stock markets can be extraordinarily comforting to an
investor “ equities appear to have provided a substantial premium over bonds, and markets
appear to have recovered nicely after huge crashes. ... Less comforting is the past history of
other major markets: Russia, China, Germany and Japan. Each of these markets has had one
or more major interruptions that prevent their inclusion in long term studies” [my emphasis].
Think of the things that didn™t happen in the last 50 years. We had no banking panics,
and no depressions; no civil wars, no constitutional crises; we did not lose the cold war, no
missiles were ¬red over Berlin, Cuba, Korea or Vietnam. If any of these things had happened,
we might well have seen a calamitous decline in stock values, and I would not be writing
about the equity premium puzzle.
A view that stocks are subject to occasional and highly non-normal crashes “ world wars,
great depressions, etc. “ makes sampling uncertainty even larger, and means that the average
return from any sample that does not include a crash will be larger than the actual average
return “ the Peso Problem again (Reitz 1988).
Fama and French (2000) notice that the price/dividend ratio is low at the beginning of the
sample and high at the end. Much of that is luck“the dividend yield is stationary in the very
long run, with slow-moving variation through good and bad times. We can understand their
alternative calculation most easily using the return linearization,

rt+1 = ∆dt+1 + (dt ’ pt ) ’ ρ(dt+1 ’ pt+1 ).

Then, imposing the view that the dividend price ratio is stationary, we can estimate the aver-
age return as

E (rt+1 ) = E (∆dt+1 ) + (1 ’ ρ)E(dt ’ pt ).

The right hand expression gives an estimate of the unconditional average return on stocks
equal to 3.4%. This differes from the sample average return of 9% because, the d/p ratio


declined dramatically in the postwar sample.
Here is the fundamental issue: Was it clear to people in 1947 (or 1871, or whenever
one starts the sample) and throughout the period that the average return on stocks would be
8% greater than that of bonds, subject only to the 16% year to year variation? Given that
knowledge, would investors have changed their portfolios, or would they have stayed pat,
patiently explaining that these average returns are earned in exchange for risk that they are
not prepared to take? If people expected these mean returns, then we face a tremendous
challenge of explaining why people did not buy more stocks. This is the basic assumption
and challenge of the equity premium puzzle. But phrased this way, the answer is not so
clear. I don™t think it was obvious in 1947 that the United States would not slip back into
depression, or another world war, but would instead experience a half century of economic
growth and stock returns never before seen in human history. 8% seems like an extremely “
maybe even irrationally “ exuberant expectation for stock returns as of 1947, or 1871. (You
can ask the same question, by the way, about value effects, market timing, or other puzzles
we try to explain. Only if you can reasonably believe that people understood the average
returns and shied away because of the risks does it make sense to explain the puzzles by risk
rather than luck. Only in that case with the return premia continue anyway!)
This consideration mitigates, but cannot totally solve the equity premium puzzle. Even a
3% equity premium is tough to understand with 1% consumption volatility. If the premium
is 3%, the Sharpe ratio is 3/16 ≈ 0.2, so we still need risk aversion of 20, and 100 if we
include correlation. 20-100 is a lot better than 50-250, but is still quite a challenge.

21.1.3 Predictability and the equity premium

The Sharpe ratio varies over time. This means that discount factor volatility must vary
over time. Since consumption volatility does not seem to vary over time, this suggests that
risk aversion must vary over time “ a conditional equity premium puzzle.
Conventional portfolio calculations suggest that people are not terribly risk averse. These
calculations implicitly assume that consumption moves proportionally to wealth, and inherits
the large wealth volatility.
If stock returns mean-revert, E(Re )/σ(Re ) and hence σ(m)/E(m) rises faster than the
square root of the horizon. Consumption growth is roughly i.i.d., so σ(∆c) rises about with
the square root of horizon. Thus, mean-reversion means that the equity premium puzzle is
even worse for long-horizon investors and long-horizon returns.

We have traced the implications of the unconditional Sharpe ratio, and of low and rela-
tively constant interest rates. The predictability of stock returns also has important implica-
tions for discount factors.


Heteroskedasticity in the discount factor“conditional equity premium puzzle
The Hansen-Jagannathan bound applies conditionally of course,
Et Rt+1 1 σ t (mt+1 )
=’ .
e e
σt (Rt+1 ) ρt (Rt+1 , mt+1 ) Et (mt+1 )
Mean returns are predictable, and the standard deviation of returns varies over time. So
far, however, the two moments are forecasted by different sets of variables and at different
horizons “ d/p, term premium, etc. forecast the mean at long horizons; past squared returns
and implied volatility forecast the variance at shorter horizons “ and these variables move at
different times. Hence, it seems that the conditional Sharpe ratio on the left hand side moves
over time. (Glosten, Jagannathan and Runkle 1993, French Schwert and Stambaugh 1987,
Yan 2000 ¬nd some co-movements in conditional mean and variance, but do not ¬nd that all
movement in one moment is matched by movement in the other.)
On the right hand side, the conditional mean discount factor equals the risk free rate and
so must be relatively stable over time. Time-varying conditional correlations are a possibility,
but hard to interpret. Thus, the predictability of returns strongly suggests that the discount
factor must be conditionally heteroskedastic “ σt (mt+1 ) must vary through time. Certainly
the discount factors on the volatility bound, or the mimicking portfolios for discount factors,
both of which have ρ = 1, must have time-varying volatility.
In the standard time-separable model, σt (mt+1 ) = γ t σ t (∆ct+1 ). Thus, we need either
time-varying consumption risk or time-varying curvature; loosely speaking a time-varying
risk aversion. The data don™t show much evidence of conditional heteroskedasticity in con-
sumption growth, leading one to favor a time-varying risk aversion. However, this is a case in
which high risk aversion helps: if γ is suf¬ciently high, a small and perhaps statistically hard
to measure amount of consumption heteroskedasticity can generate a lot of discount factor
heteroskedasticity. (Kandel and Stambaugh 1990 follow this approach to explain predictabil-
Capm, portfolios and consumption
The equity premium puzzle is centrally about the smoothness of consumption. This is
why it was not noticed as a major puzzle in the early development of ¬nancial theory. In turn,
the smoothness of consumption is centrally related to the predictability of returns.
In standard portfolio analyses, there is no puzzle that people with normal levels of risk
aversion do not want to hold far more stocks. From the usual ¬rst order condition and with
Λ = VW (W ) we can also write the Hansen-Jagannathan bound in terms of wealth, analo-
gously to (21.342),
¯ ¯
¯E(r) ’ rf ¯ ’W VW W
¤ σ (∆w)
σ(r) VW
The quantity ’W WW W /VW is in fact the measure of risk aversion corresponding to most
survey and introspection evidence, since it represents aversion to bets on wealth rather than


to bets on consumption. (They can be the same for power utility, but not in general.)
For an investor who holds the market, σ (∆w) is the standard deviation of the stock return,
about 16%. With a market Sharpe ratio of 0.5, we ¬nd the lower bound on risk aversion,
’W VW W 0.5
= ≈ 3.
VW 0.16
Furthermore, the correlation between wealth and the stock market is one in this calculation,
so no correlation puzzle crops up to raise the required risk aversion. This is the heart of the
oft-cited Friend and Blume (1975) calculation of risk aversion, one source of the idea that
3-5 is about the right level of risk aversion rather than 50 or 250.
The Achilles heel is the hidden simplifying assumption that returns are independent over
time, and the investor has no other source of income, so no variables other than wealth show
up in its marginal value VW . In such an i.i.d. world, consumption moves one-for-one with
wealth, and σ (∆c) = σ (∆w). If your wealth doubles and nothing else has changed, you
double consumption. This calculation thus hides a consumption-based “model,” and the
model has the drastically counterfactual implication that consumption growth has a 16%
standard deviation!
All this calculation has done is say that “in a model in which consumption has a 16%
volatility like stock returns, we don™t need high risk aversion to explain the equity premium.”
Hence the central point “ the equity premium is about consumption smoothness. Just looking
at wealth and portfolios, you do not notice anything unusual.
In the same way, retreating to the CAPM or factor models doesn™t solve the puzzle either.
The CAPM is a specialization of the consumption-based model, not alternatives, and thus
hide an equity premium puzzle. For example, I derived the CAPM above as a consequence of
log utility. With log utility, you have to believe that properly measured consumption growth
has a 50% per year standard deviation! That testable implication is right there in the model,
though often ignored. Most implementations of the CAPM take the market premium as
given (ignoring the link to consumption in the model™s derivation) and estimate the market
premium as a free parameter. The equity premium puzzle asks whether the market premium
itself makes any sense.
The long-run equity premium puzzle
The fact that annual consumption is much smoother than wealth is an important piece of
information. In the long-run, consumption must move one-for-one with wealth, so consump-
tion and wealth volatility must be the same. Therefore, we know that the world is very far
from i.i.d., so predictability will be an important issue in understanding risk premia.
Predictability can imply mean reversion and Sharpe ratios that rise faster than the square
root of horizon. Thus,
t’t+k ) σ(mt’t+k )
¡e ¢¤ ≈ γσ(∆ct’t+k ).
E(mt’t+k )
σ Rt’t+k


If stocks do mean-revert, then discount factor volatility must increase faster than the square
root of the horizon. Consumption growth is close to i.i.d.,so the volatility of consumption
growth only increases with the square root of horizon. Thus mean-reversion implies that the
equity premium puzzle is even worse at long investment horizons.

21.2 New models

We want to end up with a model that explains a high market Sharpe ratio, and the high
level and volatility of stock returns, with low and relatively constant interest rates, roughly
i.i.d. consumption growth with small volatility, and that explains the predictability of excess
returns “ the fact that high prices today correspond to low excess returns in the future. Even-
tually, we would like the model to explain the predictability of bond and foreign exchange
returns as well, the time-varying volatility of stock returns and the cross-sectional variation
of expected returns, and it would be nice if in addition to ¬tting all of the facts, people in the
models did not display unusually high aversion to wealth bets.
I start with a general outline of the features that most models that address these puzzles
share. Then, I focus on two models, the Campbell-Cochrane (1999) habit persistence model
and the Constantinides and Duf¬e (1996) model with uninsured idiosyncratic risks. The
mechanisms we uncover in these models apply to a large class. The Campbell-Cochrane
model is a representative from the literature that attacks the equity premium by modifying the
representative agent™s preferences. The Constantinides and Duf¬e model is a representative
of the literature that attacks the equity premium by modeling uninsured idiosyncratic risks,
market frictions, and limited participation.

21.2.1 Outlines of new models

Additional state variables are the natural route to solving the empirical puzzles. Investors
must not be particularly scared of the wealth or consumption effects of holding stocks, but
of the fact that stocks do badly at particular times, or in particular states of nature. Broadly
speaking, most solutions introduce something like a “recession” state variable. This fact
makes stocks different, and more feared, than pure wealth bets, whose risk is unrelated to the
state of the economy.
In the ICAPM way of looking at things, we get models of this sort by specifying things
so there is an additional recession state variables z in the value function V (W, z). Then,
expected returns are

µ ¶
’W VW W dW zVW z
E(r) ’ rf = cov ,r + cov(z, r).


In a utility framework, we add other arguments to the utility function u(C, z), so
µ ¶
’CuCC dC zuCz
E(r) ’ rf = cov ,r + cov(z, r).
uC C uC

The extra utility function arguments must enter non-separably. If u(C, z) = f (C) + g(z),
then uCz = 0. All utility function modi¬cations are of this sort “ they add extra goods like
leisure, nonseparability over time in the form of habit persistence, or nonseparability across
states of nature so that consumption if it rains affects marginal utility if it shines.
The lesson of the equity premium literature is that the second term must account for
essentially all of the market premium. Since the cross-sectional work surveyed in Chap-
ter 20 seemed to point to something like a recession factor as the primary determinant of
cross-sectional variation in expected returns, a gratifying unity seems close at hand “ and a
fundamental revision of the CAPM-i.i.d. view of the source of risk prices.
The predictability of returns “ emphasized by the dramatic contrast between consumption
and wealth volatility at short horizons “ suggests a natural source of state variables. Unfortu-
nately, the sign is wrong. The fact that stocks go up when their expected subsequent returns
are low means that stocks, like bonds, are good hedges for shocks to their own opportunity
sets. Therefore, adding the effects of predictability typically lowers expected returns. (The
“typically” in this sentence is important. The sign of this effect “ the sign of zVW z “ does de-
pend on the utility function and environment. For example, there is no risk premium for log
Thus, we need an additional state variable, and one strong enough to not only explain
the equity premium, given that the ¬rst terms in (21.345) and (21.346) are not up to the job,
but one stronger still to overcome the effects of predictability. Recessions are times of low
prices and high expected returns. We want a model in which recessions are bad times, so that
investors fear bad stock returns in recessions. But high expected returns are good times for
a pure Merton investor. Thus, the other state variable(s) that describe a recession “ high risk
aversion, low labor income, high labor income uncertainty, liquidity, etc. “ must overcome
the “good times” of high expected returns and indicate that times really are bad after all.

21.2.2 Habits

A natural explanation for the predictability of returns from price/dividend ratios is that people
get less risk averse as consumption and wealth increase in a boom, and more risk averse
as consumption and wealth decrease in a recession. We can™t tie risk aversion to the level
of consumption and wealth, since that increases over time while equity premia have not
declined. Thus, to pursue this idea, we must specify a model in which risk aversion depends
on the level of consumption or wealth relative to some “trend” or the recent past.
Following this idea, Campbell and Cochrane (1999) specify that people slowly develop
habits for higher or lower consumption. Thus, the “habits” form the “trend” in consumption.


The idea is not implausible. Anyone who has had a large pizza dinner or smoked a cigarette
knows that what you consumed yesterday can have an impact on how you feel about more
consumption today. Might a similar mechanism apply for consumption in general and at a
longer time horizon? Perhaps we get used to an accustomed standard of living, so a fall in
consumption hurts after a few years of good times, even though the same level of consumption
might have seemed very pleasant if it arrived after years of bad times. This thought can at
least explain the perception that recessions are awful events, even though a recession year
may be just the second or third best year in human history rather than the absolute best. Law,
custom and social insurance also insure against falls in consumption as much as low levels of
The Model
We model an endowment economy with i.i.d. consumption growth.

∆ct+1 = g + vt+1 ; vt+1 ∼ i.i.d. N (0, σ 2 ).

We replace the utility function u(C) with u(C ’ X) where X denotes the level of habits.

X ’ Xt )1’γ ’ 1
t (Ct
E δ .

Habits should move slowly in response to consumption, something like

φj ct’j (347)
xt ≈ »

or, equivalently

xt = φxt’1 + »ct .

(Small letters denote the logs of large letters throughout this section, ct = ln Ct , etc.)
Rather than letting habit itself follow an AR(1) we let the “surplus consumption ratio” of
consumption to habit follow an AR(1):

Ct ’ Xt
St =

st+1 = (1 ’ φ)¯ + φst + » (st ) (ct+1 ’ ct ’ g)

Since s contains c and x, this equation also speci¬es how x responds to c, and it is locally the
same as (21.347). We also allow consumption to affect habit differently in different states by


specifying an square root type process rather than a simple AR(1),


. 15
( 17)