. 5
( 17)


£ ¤
cov(Ri , Rmv ) = cov (R— + wRe— ) , (R— + wi Re— )
= var(R— ) + wwi var(Re— ) ’ (w + wi )E(R— )E(Re— )
= var(R— ) ’ wE(R— )E(Re— ) + wi [w var(Re— ) ’ E(R— )E(Re— )]

Thus, cov(Ri , Rmv ) and E(Ri ) are both linear functions of wi . We can solve
cov(Ri , Rmv ) for wi , plug into the expression for E(Ri ) and we™re done.
To do this, of course, we must be able to solve cov(Ri , Rmv ) for wi . This requires

E(R— )E(Re— ) E(R— )E(Re— ) E(R— )
w 6= = =
var(Re— ) E(Re—2 ) ’ E(Re— )2 1 ’ E(Re— )
which is the condition for the minimum variance return.

6.7 Problems

1. In the argument that Rmv on the mean variance frontier, Rmv = R— + wRe— , implies a
discount factor m = a + bRmv , do we have to rule out the case of risk neutrality? (Hint:
What is Re— when the economy is risk-neutral?)
2. If you use factor mimicking portfolios as in (6.93), you know that the predictions for
expected returns are the same as they are if you use the factors themselves . Are the ±— ,


»— , and β — for the factor mimicking portfolio representation the same as the original ±,
», and β of the factor pricing model?
3. Suppose the CAPM is true, m = a ’ bRm prices a set of assets, and there is a risk-free
rate Rf . Find R— in terms of the moments of Rm , Rf .
4. If you express the mean-variance frontier as a linear combination of factor-mimicking
portfolios from a factor model, do the relative weights of the various factor portfolios in
the mean-variance ef¬cient return change as you sweep out the frontier, or do they stay
the same? (Start with the riskfree rate case)
5. For an arbitrary mean-variance ef¬cient return of the form R— + wRe— , ¬nd its zero-beta
return and zero-beta rate. Show that your rate reduces to the riskfree rate when there is
6. When the economy is risk neutral, and if there is no risk-free rate, show that the
zero-beta, minimum-variance, and constant-mimicking portfolio returns are again all
equivalent, though not equal to the risk-free rate. (In this case, the mean-variance frontier
is just the minimum-variance point.)

Chapter 7. Implications of existence and
equivalence theorems

Existence of a discount factor means p = E(mx) is innocuous, and all content ¬‚ows from
the discount factor model.
The theorems apply to sample moments too; the dangers of ¬shing up ex-post or sample
mean-variance ef¬cient portfolios.
Sources of discipline in factor ¬shing expeditions.
The joint hypothesis problem. How ef¬ciency tests are the same as tests of economic
discount factor models.
Factors vs. their mimicking portfolios.
Testing the number of factors.
Plotting contingent claims on the axis vs. mean and variance.

The theorems on the existence of a discount factor, and the equivalence between the p =
E(mx), expected return - beta, and mean-variance views of asset pricing have important
implications for how we approach and evaluate empirical work.
The equivalence theorems are obviously important, especially to the theme of this book,
to show that the choice of discount factor language versus expected return-beta language
or mean-variance frontier is entirely one of convenience. Nothing in the more traditional
statements is lost.

p = E(mx) is innocuous
Before Roll (1976), expected return “ beta representations had been derived in the con-
text of special and explicit economic models, especially the CAPM. In empirical work, the
success of any expected return - beta model seemed like a vindication of the whole structure.
The fact that, for example, one might use the NYSE value-weighted index portfolio in place
of the return on total wealth predicted by the CAPM seemed like a minor issue of empirical
When Roll showed that mean-variance ef¬ciency implies a single beta representation,
all that changed. Some single beta representation always exists, since there is some mean-
variance ef¬cient return. The asset pricing model only serves to predict that a particular
return (say, the “market return”) will be mean-variance ef¬cient. Thus, if one wants to “test
the CAPM” it becomes much more important to be choosy about the reference portfolio, to
guard against stumbling on something that happens to be mean-variance ef¬cient and hence
prices assets by construction.

This insight led naturally to the use of broader wealth indices (Stambaugh 1982) in the
reference portfolio to provide a more grounded test of the CAPM. However, this approach
has not caught on. Stocks are priced with stock factors, bonds with bond factors, and so on.
More recently, stocks sorted on size, book/market, and past performance characteristics are
priced by portfolios sorted on those characteristics. Part of the reason for this is that the betas
are small; stocks and bonds are not highly correlated so risk premia from one source of betas
have small impacts on another set of average returns. Larger measures of wealth including
human capital and real estate do not come with high frequency price data, so adding them to
a wealth portfolio has little effect on betas.
The good news in this existence theorem is that you can always start by writing an ex-
pected return-beta model, knowing that you have imposed almost no structure in doing so.
The bad news is that you haven™t gotten very far. All the economic, statistical and predictive
content comes in picking the factors.
The theorem that, from the law of one price, there exists some discount factor m such
that p = E(mx) is just an updated restatement of Roll™s theorem. The content is all in
m = f (data) not in p = E(mx). Again, an asset pricing framework that initially seemed to
require a lot of completely unbelievable structure“the representative consumer consumption-
based model in complete frictionless markets“turns out to require (almost) no structure at all.
Again, the good news is that you can always start by writing p = E(mx), and need not suffer
criticism about hidden contingent claim or representative consumer assumptions in so doing.
The bad news is that you haven™t gotten very far by writing p = E(mx) as all the economic,
statistical and predictive content comes in picking the discount factor model m = f (data).

Ex-ante and ex-post.
I have been deliberately vague about the probabilities underlying expectations and other
moments in the theorems. The fact is, the theorems hold for any set of probabilities4 . Thus,
the existence and equivalence theorems work equally well ex-ante as ex-post: E(mx), β, E(R)
and so forth can refer to agent™s subjective probability distributions, objective population
probabilities, or to the moments realized in a given sample.
Thus, if the law of one price holds in a sample, one may form an x— from sample moments
that satis¬es p(x) = E(x— x), exactly, in that sample, where p(x) refers to observed prices
and E(x— x) refers to the sample average. Equivalently, if the sample covariance matrix of
a set of returns is nonsingular, there exists an ex-post mean-variance ef¬cient portfolio for
which sample average returns line up exactly with sample regression betas.
This observation points to a great danger in the widespread exercise of searching for and
statistically evaluating ad-hoc asset pricing models. Such models are guaranteed empirical
success in a sample if one places little enough structure on what is included in the discount
factor function. The only reason the model doesn™t work perfectly is the restrictions the re-
searcher has imposed on the number or identity of the factors included in m, or the parameters
of the function relating the factors to m. Since these restrictions are the entire content of the
Precisely, any set of probabilities that agree on impossible (zero-probability) events.


model, they had better be interesting, carefully described and well motivated!
Obviously, this is typically not the case or I wouldn™t be making such a fuss about it. Most
empirical asset pricing research posits an ad-hoc pond of factors, ¬shes around a bit in that
set, and reports statistical measures that show “success,” in that the model is not statistically
rejected in pricing an ad-hoc set of portfolios. The set of discount factors is usually not large
enough to give the zero pricing errors we know are possible, yet the boundaries are not clearly

What is wrong, you might ask, with ¬nding an ex-post ef¬cient portfolio or x— that prices
assets by construction? Perhaps the lesson we should learn from the existence theorems is to
forget about economics, the CAPM, marginal utility and all that, and simply price assets with
ex-post mean variance ef¬cient portfolios that we know set pricing errors to zero!
The mistake is that a portfolio that is ex-post ef¬cient in one sample, and hence prices
all assets in that sample, is unlikely to be mean-variance ef¬cient, ex-ante or ex-post, in the
next sample, and hence is likely to do a poor job of pricing assets in the future. Similarly,
the portfolio x— = p0 E(xx0 )’1 x (using the sample second moment matrix) that is a discount
factor by construction in one sample is unlikely to be a discount factor in the next sample;
the required portfolio weights p0 E(xx0 )’1 change, often drastically, from sample to sample.
For example, suppose the CAPM is true, the market portfolio is ex-ante mean-variance ef-
¬cient, and sets pricing errors to zero if you use true or subjective probabilities. Nonetheless,
the market portfolio is unlikely to be ex-post mean-variance ef¬cient in any given sample. In
any sample, there will be lucky winners and unlucky losers. An ex-post mean variance ef-
¬cient portfolio will be a Monday-morning quarterback; it will tell you to put large weights
on assets that happened to be lucky in a given sample, but are no more likely than indicated
by their betas to generate high returns in the future. “Oh, if I had only bought Microsoft in
1982...” is not a useful guide to forming a mean-variance ef¬cient portfolio today. (In fact,
mean-reversion in the market and book/market effects in individual stocks suggest that if
anything, assets with unusually good returns in the past are likely to do poorly in the future!)
The only solution is to impose some kind of discipline in order to avoid dredging up
spuriously good in-sample pricing.
The situation is the same as in traditional regression analysis. Regressions are used to
forecast or to explain a variable y by other variables x in a regression y = x0 β + µ. By
blindly including right hand variables, one can produce models with arbitrarily good statis-
tical measures of ¬t. But this kind of model is typically unstable out of sample or otherwise
useless for explanation or forecasting. One has to carefully and thoughtfully limit the search
for right hand variables x to produce good models.
What makes for an interesting set of restrictions? Econometricians wrestling with y =
x β + µ have been thinking about this question for about 50 years, and the best answers

are 1) use economic theory to carefully specify the right hand side and 2) use a battery of

cross-sample and out-of-sample stability checks.
Alas, this advice is hard to follow. Economic theory is usually either silent on what
variables to put on the right hand side of a regression, or allows a huge range of variables.
The same is true in ¬nance. “What are the fundamental risk factors?” is still an unanswered
question. At the same time one can appeal to the APT and ICAPM to justify the inclusion of
just about any desirable factor (Fama 1991 calls these theories a “¬shing license.”) Thus, you
will grow old waiting for theorists to provide useful answers to this kind of question.
Following the purely statistical advice, the battery of cross-sample and out-of-sample
tests often reveals the model is unstable, and needs to be changed. Once it is changed, there
is no more out-of-sample left to check it. Furthermore, even if one researcher is pure enough
to follow the methodology of classical statistics, and wait 50 years for another fresh sample
to be available before contemplating another model, his competitors and journal editors are
unlikely to be so patient. In practice, then, out of sample validation is not a strong guard
against ¬shing.
Nonetheless, these are the only standards we have to guard against ¬shing. In my opinion,
the best hope for ¬nding pricing factors that are robust out of sample and across different
markets, is to try to understand the fundamental macroeconomic sources of risk. By this I
mean, tying asset prices to macroeconomic events, in the way the ill-fated consumption based
model does via mt+1 = βu0 (ct+1 )/u0 (ct ). The dif¬culties of the consumption-based model
have made this approach lose favor in recent years. However, the alternative approach is also
running into trouble that the number and identity of empirically-determined risk factors does
not seem stable. Every time a new anomaly or data set pops up, a new set of ad-hoc factors
gets created to explain them!
In any case, one should always ask of a factor model, “what is the compelling economic
story that restricts the range of factors used?” and / or “what statistical restraints are used”
to keep from discovering ex-post mean variance ef¬cient portfolios, or to ensure that the
results will be robust across samples. The existence theorems tell us that the answers to these
questions are the only content of the exercise. If the purpose of the model is not just to predict
asset prices but also to explain them, this puts an additional burden on economic motivation
of the risk factors.
There is a natural resistance to such discipline built in to our current statistical method-
ology for evaluating models (and papers). When the last author ¬shed around and produced
an ad-hoc factor pricing model that generates 1% average pricing errors, it is awfully hard
to persuade readers, referees, journal editors, and clients that your economically motivated
factor pricing model is better despite 2% average pricing errors. Your model may really be
better and will therefore continue to do well out of sample when the ¬shed model falls by
the wayside of ¬nancial fashion, but it is hard to get past statistical measures of in-sample ¬t.
One hungers for a formal measurement of the number of hurdles imposed on a factor ¬shing
expedition, like the degrees of freedom correction in R2 . Absent a numerical correction, we
have to use judgment to scale back apparent statistical successes by the amount of economic
and statistical ¬shing that produced them.


Mimicking portfolios
The theorem x— = proj(m|X) also has interesting implications for empirical work. The
pricing implications of any model can be equivalently represented by its factor-mimicking
portfolio. If there is any measurement error in a set of economic variables driving m, the
factor-mimicking portfolios for the true m will price assets better than an estimate of m that
uses the measured macroeconomic variables.
Thus, it is probably not a good idea to evaluate economically interesting models with
statistical horse races against models that use portfolio returns as factors. Economically in-
teresting models, even if true and perfectly measured, will just equal the performance of their
own factor-mimicking portfolios, even in large samples. They will always lose in sample
against ad-hoc factor models that ¬nd nearly ex-post ef¬cient portfolios.
This said, there is an important place for models that use returns as factors. After we
have found the underlying true macro factors, practitioners will be well advised to look at
the factor-mimicking portfolio on a day-by-day basis. Good data on the factor-mimicking
portfolios will be available on a minute-by-minute basis. For many purposes, one does not
have to understand the economic content of a model.
But this fact does not tell us to circumvent the process of understanding the true macroe-
conomic factors by simply ¬shing for factor-mimicking portfolios. The experience of practi-
tioners who use factor models seems to bear out this advice. Large commercial factor models
resulting from extensive statistical analysis (otherwise known as ¬shing) perform poorly out
of sample, as revealed by the fact that the factors and loadings (β) change all the time.
Also models speci¬ed with economic fundamentals will always seem to do poorly in
a given sample against ad-hoc variables (especially if one ¬shes an ex-post mean-variance
ef¬cient portfolio out of the latter!). But what other source of discipline do we have?

Irrationality and Joint Hypothesis
Finance contains a long history of ¬ghting about “rationality” vs. “irrationality” and
“ef¬ciency” vs. “inef¬ciency” of asset markets. The results of many empirical asset pricing
papers are sold as evidence that markets are “inef¬cient” or that investors are “irrational.” For
example, the crash of October 1987, and various puzzles such as the small-¬rm, book/market,
seasonal effects or long-term predictability have all been sold this way.
However, none of these puzzles documents an arbitrage opportunity5 . Therefore, we
know that there is a “rational model”“a stochastic discount factor, an ef¬cient portfolio to use
in a single-beta representation”that rationalizes them all. And we can con¬dently predict
this situation to continue; real arbitrage opportunities do not last long! Fama (1970) contains
a famous statement of the same point. Fama emphasized that any test of “ef¬ciency” is a joint
test of ef¬ciency and a “model of market equilibrium.” Translated, an asset pricing model, or
a model of m.

The closed-end fund puzzle comes closest since it documents an apparent violation of the law of one price.

However, you can™t costlessly short closed end funds, and we have ignored short sales constraints so far.

But surely markets can be “irrational” or “inef¬cient” without requiring arbitrage oppor-
tunities? Yes, they can, if (and only if) the discount factors that generate asset prices are
disconnected from marginal rates of substitution or transformation in the real economy. But
now we are right back to specifying and testing economic models of the discount factor! At
best, an asset pricing puzzle might be so severe that we can show that the required discount
factors are completely “unreasonable” (by some standard) measures of real marginal rates of
substitution and/or transformation, but we still have to say something about what a reasonable
marginal rate looks like.
In sum, the existence theorems mean that there are no quick proofs of “rationality” or
“irrationality.” The only game in town for the purpose of explaining asset prices is thinking
about economic models of the discount factor.

The number of factors.
Many asset pricing tests focus on the number of factors required to price a cross-section
of assets. The equivalence theorems imply that this is a silly question. A linear factor model
m = b0 f or its equivalent expected return / beta model E(Ri ) = ± + β 0 »f are not unique
representations. In particular, given any multiple-factor or multiple-beta representation we
can easily ¬nd a single-beta representation. The single factor m = b0 f will price assets
just as well as the original factors f, as will x— = proj(b0 f | X) or the corresponding
R— . All three options give rise to single-beta models with exactly the same pricing ability as
the multiple factor model. We can also easily ¬nd equivalent representations with different
numbers (greater than one) of factors. For example, write
µ ¶
b3 ˆ
m = a + b1 f1 + b2 f2 + b3 f3 = a + b1 f1 + b2 f2 + f3 = a + b1 f1 + b2 f2
to reduce a “three factor” model to a “two factor” model. In the ICAPM language, consump-
tion itself could serve as a single state variable, in place of the S state variables presumed to
drive it.
There are times when one is interested in a multiple factor representation. Sometimes the
factors have an economic interpretation that is lost on taking a linear combination. But the
pure number of pricing factors is not a meaningful question.

Discount factors vs. mean, variance and beta.
The point of the previous chapter was to show how the discount factor, mean-variance,
and expected return- beta models are all equivalent representations of asset pricing. It seems
a good moment to contrast them as well; to understand why the mean-variance and beta
language developed ¬rst, and to think about why the discount factor language seems to be
taking over.
Asset pricing started by putting mean and variance of returns on the axes, rather than
payoff in state 1 payoff in state 2, etc. as we do now. The early asset pricing theorists posed
the question just right: they wanted to treat assets in the apples-and-oranges, indifference


curve and budget set framework of macroeconomics. The problem was, what labels to put
on the axis? Clearly, “IBM stock” and “GM stock” is not a good idea; investors do not
value securities per se, but value some aspects of the stream of random cash ¬‚ows that those
securities give rise to.
Their brilliant insight was to put the mean and variance of the portfolio return on the axis;
to treat these as “hedonics” by which investors valued their portfolios. Investors plausibly
want more mean and less variance. They gave investors “utility functions” de¬ned over this
mean and variance, just as standard utility functions are de¬ned over apples and oranges. The
mean-variance frontier is the “budget set.”
With this focus on portfolio mean and variance, the next step was to realize that each
security™s mean return measures its contribution to the portfolio mean, and that regression
betas on the overall portfolio give each security™s contribution to the portfolio variance. The
mean-return vs. beta description for each security followed naturally.
In a deep sense, the transition from mean-variance frontiers and beta models to discount
factors represents the realization that putting consumption in state 1 and consumption in
state 2 on the axes ” specifying preferences and budget constraints over state-contingent
consumption ” is a much more natural mapping of standard microeconomics into ¬nance
than putting mean, variance, etc. on the axes. If for no other reason, the contingent claim
budget constraints are linear, while the mean-variance frontier is not. Thus, I think, the focus
on means and variance, the mean-variance frontier and expected return/beta models is all
due to an accident of history, that the early asset pricing theorists happened to put mean and
variance on the axes rather than state contingent consumption.
Well, here we are, why prefer one language over another? The discount factor language
has an advantage for its simplicity, generality, mathematical convenience, and elegance.
These virtues are to some extent in the eye of the beholder, but to this beholder, it is in-
spiring to be able to start every asset pricing calculation with one equation, p = E(mx).
This equation covers all assets, including bonds, options, and real investment opportunities,
while the expected return/beta formulation is not useful or very cumbersome in the latter ap-
plications. Thus, it has seemed that there are several different asset pricing theories: expected
return/beta for stocks, yield-curve models for bonds, arbitrage models for options. In fact all
three are just cases of p = E(mx). As a particular example, arbitrage, in the precise sense
of positive payoffs with negative prices, has not entered the equivalence discussion at all. I
don™t know of any way to cleanly graft absence of arbitrage on to expected return/beta mod-
els. You have to tack it on after the fact “ “by the way, make sure that every portfolio with
positive payoffs has a positive price.” It is trivially easy to graft it on to a discount factor
model: just add m > 0.
The discount factor and state space language also makes it easier to think about different
horizonsP the present value statement of models. p = E(mx) generalizes quickly to
pt = Et j mt,t+j xt+j , while returns have to be chained together to think about multiperiod
models. Papers are still written arguing about geometric vs. arithmetic average returns for
multiperiod discounting.

The choice of language is not about normality or return distributions. There is a lot of
confusion about where return distribution assumptions show up in ¬nance. I have made no
distributional assumptions in any of the discussion so far. Second moments as in betas and
the variance of the mean-variance frontier show up because p = E(mx) involves a second
moment. One does not need to assume normality to talk about the mean-variance frontier.
Returns on the mean-variance frontier price other assets even when returns are not normally

Chapter 8. Conditioning information
The asset pricing theory I have sketched so far really describes prices at time t in terms of
conditional moments. The investor™s ¬rst order conditions are

pt u0 (ct ) = βEt [u0 (ct+1 )xt+1 ]

where Et means expectation conditional on the investor™s time t information. Sensibly, the
price at time t should be higher if there is information at time t that the discounted payoff is
likely to be higher than usual at time t + 1. The basic asset pricing equation should be

pt = Et (mt+1 xt+1 ).

(Conditional expectation can also be written

pt = E [mt+1 xt+1 |It ]

when it is important to specify the information set It .).
If payoffs and discount factors were independent and identically distributed (i.i.d.) over
time, then conditional expectations would be the same as unconditional expectations and
we would not have to worry about the distinction between the two concepts. But stock
price/dividend ratios, bond and option prices all change over time, which must re¬‚ect chang-
ing conditional moments of something on the right hand side.
One approach is to specify and estimate explicit statistical models of conditional distribu-
tions of asset payoffs and discount factor variables (e.g. consumption growth). This approach
is sometimes used, and is useful in some applications, but it is usually cumbersome. As we
make the conditional mean, variance, covariance, and other parameters of the distribution of
(say) N returns depend ¬‚exibly on M information variables, the number of required param-
eters can quickly exceed the number of observations.
More importantly, this explicit approach typically requires us to assume that investors use
the same model of conditioning information that we do. We obviously don™t even observe all
the conditioning information used by economic agents, and we can™t include even a fraction
of observed conditioning information in our models. The basic feature and beauty of asset
prices (like all prices) is that they summarize an enormous amount of information that only
individuals see. The events that make the price of IBM stock change by a dollar, like the
events that make the price of tomatoes change by 10 cents, are inherently unobservable to
economists or would-be social planners (Hayek 1945). Whenever possible, our treatment of
conditioning information should allow agents to see more than we do.
If we don™t want to model conditional distributions explicitly, and if we want to avoid as-
suming that investors only see the variables that we include in an empirical investigation, we
eventually have to think about unconditional moments, or at least moments conditioned on
less information than agents see. Unconditional implications are also interesting in and of
themselves. For example, we may be interested in ¬nding out why the unconditional mean


returns on some stock portfolios are higher than others, even if every agent fundamentally
seeks high conditional mean returns. Most statistical estimation essentially amounts to char-
acterizing unconditional means, as we will see in the chapter on GMM. Thus, rather than
model conditional distributions, this chapter focuses on what implications for unconditional
moments we can derive from the conditional theory.

8.1 Scaled payoffs

pt = Et (mt+1 xt+1 ) ’ E(pt zt ) = E(mt+1 xt+1 zt )

One can incorporate conditioning information by adding scaled payoffs and doing everything
unconditionally. I interpret scaled returns as payoffs to managed portfolios.

8.1.1 Conditioning down

The unconditional implications of any pricing model are pretty easy to state. From

pt = Et (mt+1 xt+1 )

we can take unconditional expectations to obtain6

E(pt ) = E(mt+1 xt+1 ).

Thus, if we just interpret p to stand for E(pt ), everything we have done above applies
to unconditional moments. In the same way, we can also condition down from agents™ ¬ne
information sets to coarser sets that we observe,

pt = E(mt+1 xt+1 | „¦) ’ E(pt |I ‚ „¦) = E(mt+1 xt+1 | I ‚ „¦)
’ pt = E(mt+1 xt+1 | It ‚ „¦t ) if pt ∈ It .

In making the above statements I used the law of iterated expectations, which is important
enough to highlight it. This law states that if you take an expected value using less informa-
tion of an expected value that is formed on more information, you get back the expected value
using less information. Your best forecast today of your best forecast tomorrow is the same

We need a small technical assumption that the unconditional moment³ moment conditioned on a coarser
´ ³´
information set exists. For example, if X and Y are normal (0, 1), then E X |Y = 0 but E X is in¬nite.


as your best forecast today. In various useful guises,

E(Et (x)) = E(x),

Et’1 (Et (xt+1 )) = Et’1 (xt+1 )

E [E(x|„¦) | I ‚ „¦] = E [x|I]

8.1.2 Instruments and managed portfolios

We can do more than just condition down. Suppose we multiply the payoff and price by an
instrument zt observed at time t. Then,

zt pt = Et (mt+1 xt+1 zt )

and, taking unconditional expectations,

E(pt zt ) = E(mt+1 xt+1 zt ).

This is an additional implication of the conditional model, not captured by just condition-
ing down as in (8.108). This trick originates from the GMM method of estimating asset
pricing models, discussed below. The word instruments for the z variables comes from the
instrumental variables estimation heritage of GMM.
To think about equation (8.109), group (xt+1 zt ). Call this product a payoff x = xt+1 zt ,
with price p = E(pt zt ). Then 8.109 reads

p = E(mx)

once again. Rather than thinking about (8.109) as a instrumental variables estimate of a
conditional model, we can think of it as a price and a payoff, and apply all the asset pricing
theory directly.
This interpretation is not as arti¬cial as it sounds. zt xt+1 are the payoffs to managed
portfolios. An investor who observes zt can, rather than “buy and hold,” invest in an asset
according to the value of zt . For example, if a high value of zt forecasts that asset returns are
likely to be high the next period, the investor might buy more of the asset when zt is high and
vice-versa. If the investor follows a linear rule, he puts zt pt dollars into the asset each period
and receives zt xt+1 dollars the next period.
This all sounds new and different, but practically every test uses managed portfolios.
For example, the size, beta, industry, book/market and so forth portfolios of stocks are all
managed portfolios, since their composition changes every year in response to conditioning
information “ the size, beta, etc. of the individual stocks. This idea is also closely related
to the deep idea of dynamic spanning. Markets that are apparently very incomplete can in


reality provide many more state-contingencies through dynamic (conditioned on information)
trading strategies.
Equation (8.109) offers a very simple view of how to incorporate the extra information
in conditioning information: Add managed portfolio payoffs, and proceed with unconditional
moments as if conditioning information didn™t exist!
Linearity is not important. If the investor wanted to place, say, 2 + 3z 2 dollars in the
asset, we could capture this desire with an instrument z2 = 2 + 3z 2 . Nonlinear (measurable)
transformations of time’t random variables are again random variables.
We can thus incorporate conditioning information while still looking at unconditional
moments instead of conditional moments, without any of the statistical machinery of explicit
models with time-varying moments. The only subtleties are 1) The set of asset payoffs ex-
pands dramatically, since we can consider all managed portfolios as well as basic assets,
potentially multiplying every asset return by every information variable. 2) Expected prices
of managed portfolios show up for p instead of just p = 0 and p = 1 if we started with basic
asset returns and excess returns.

8.2 Suf¬ciency of adding scaled returns

Checking the expected price of all managed portfolios is, in principle, suf¬cient to check
all the implications of conditioning information.
E(zt ) = E(mt+1 Rt+1 zt ) ∀zt ∈ It ’ 1 = E(mt+1 Rt+1 |It )

E(pt ) = E(mt+1 xt+1 ) ∀ xt+1 ∈ X t+1 ’ pt = E(mt+1 xt+1 |It )

We have shown that we can derive some extra implications from the presence of con-
ditioning information by adding scaled returns. But does this exhaust the implications of
conditioning information? Are we missing something important by relying on this trick?
The answer is, in principle no.
I rely on the following mathematical fact: The conditional expectation of a variable yt+1
given an information set It , E(yt+1 | It ) is equal to a regression forecast of yt+1 using every
variable zt ∈ It . Now, “every random variable” means every variable and every nonlinear
(measurable) transformation of every variable, so there are a lot of variables in this regression!
(The word projection and proj(yt+1 |zt ) is used to distinguish the best forecast of yt+1 using
only linear combinations of zt from the conditional expectation.) Applying this fact to our
case, let yt+1 = mt+1 Rt+1 ’ 1. Then E [(mt+1 Rt+1 ’ 1) zt ] = 0 for every zt ∈ It implies
1 = E(mt+1 Rt+1 | It ). Thus, no implications are lost in principle by looking at scaled


Another way of looking at the same idea is that Rt+1 zt+1 is the return on a payoff avail-
able at time t + 1. Thus, the space of all payoffs X t+1 should be understood to include the
time-t + 1 payoff you can generate with a basis set of assets Rt+1 and all dynamic strategies
that use information in the set It . With that de¬nition of the space X t+1 we can write the
suf¬ciency of scaled returns with the more general second equality above.
“All linear and nonlinear transformations of all variables observed at time t” sounds like a
lot of instruments, and it is. But there is a practical limit to the number of instruments zt one
needs to scale by, since only variables that forecast returns or m (or their higher moments
and co-moments) add any information.
Since adding instruments is the same thing as including potential managed portfolios,
thoughtfully choosing a few instruments is the same thing as the thoughtful choice of a few
assets or portfolios that one makes in any test of an asset pricing model. Even when evaluating
completely unconditional asset pricing models, one always forms portfolios and omits many
possible assets from analysis. Few studies, in fact, go beyond checking whether a model
correctly prices 10-25 stock portfolios and a few bond portfolios. Implicitly, one feels that
the chosen payoffs do a pretty good job of spanning the set of available risk-loadings (mean
returns) and hence that adding additional assets will not affect the results. Nonetheless, since
data are easily available on all 2000 or so NYSE stocks, plus AMEX and NASDAQ stocks, to
say nothing of government and corporate bonds, returns of mutual funds, foreign exchange,
foreign equities, real investment opportunities, etc., the use of a few portfolios means that a
tremendous number of potential asset payoffs are left out in an ad-hoc manner.
In a similar manner, if one had a small set of instruments that capture all the predictability
of discounted returns mt+1 Rt+1 , then there would be no need to add more instruments.
Thus, we carefully but arbitrarily select a few instruments that we think do a good job of
characterizing the conditional distribution of returns. Exclusion of potential instruments is
exactly the same thing as exclusion of assets. It is no better founded, but the fact that it is a
common sin may lead one to worry less about it.
There is nothing special about unscaled returns, and no economic reason to place them
above scaled returns. A mutual fund might come into being that follows the managed port-
folio strategy and then its unscaled returns would be the same as an original scaled return.
Models that cannot price scaled returns are no more interesting than models that can only
price (say) stocks with ¬rst letter A through L. (There may be econometric reasons to trust
results for nonscaled returns a bit more, but we haven™t gotten to statistical issues yet.)
Of course, the other way to incorporate conditioning information is by constructing ex-
plicit parametric models of conditional distributions. With this procedure one can in fact
check all of a model™s implications about conditional moments. However, the parametric
model may be incorrect, or may not re¬‚ect some variable used by investors. Including in-
struments may not be as ef¬cient, but it is still consistent if the parametric model is incorrect.
The wrong parametric model of conditional distributions may lead to inconsistent estimates.
In addition, one avoids estimating nuisance parameters of the parametric distribution model.


8.3 Conditional and unconditional models

A conditional factor model does not imply a ¬xed-weight or unconditional factor model:
mt+1 = b0 ft+1 , pt = Et (mt+1 xt+1 ) does not imply that ∃b s.t. mt+1 = b0 ft+1 , E(pt ) =
E(mt+1 xt+1 ).
Et (Rt+1 ) = β 0 »t does not imply E(Rt+1 ) = β 0 ».
Conditional mean-variance ef¬ciency does not imply unconditional mean-variance ef¬-
The converse statements are true, if managed portfolios are included.

For explicit discount factor models”models whose parameters are constant over time”
the fact that one looks at a conditional vs. unconditional implications makes no difference to
the statement of the model.

pt = Et (mt+1 xt+1 ) ’ E(pt ) = E(mt+1 xt+1 )

and that™s it. Examples include the consumption-based model with power utility, mt+1 =
β(ct+1 /ct )’γ , and the log utility CAPM, mt+1 = 1/Rt+1 .

However, linear factor models include parameters that may vary over time and as func-
tions of conditioning information. In these cases the transition from conditional to uncondi-
tional moments is much more subtle. We cannot easily condition down the model at the same
time as the prices and payoffs.

8.3.1 Conditional vs. unconditional factor models in discount factor language

As an example, consider the CAPM

m = a ’ bRW

where RW is the return on the market or wealth portfolio. We can ¬nd a and b from the
condition that this model correctly price any two returns, for example RW itself and a risk-
free rate:
½ 1 W
 a = Rf + bEt (Rt+1 )
1 = Et (mt+1 Rt+1 )
’ .
Et (RW )’Rf
 b = f t+1 W t
1 = Et (mt+1 )Rt 2 (R
Rσ )
t t t+1

As you can see, b > 0 and a > 0: to make a payoff proportional to the minimum second-
moment return (on the inef¬cient part of the mean-variance frontier) we need a portfolio long
the risk free rate and short the market RW .


More importantly for our current purposes, a and b vary over time, as Et (RW ), σ 2 (Rt+1 ),
t+1 t
and Rt vary over time. If it is to price assets conditionally, the CAPM must be a linear factor
model with time-varying weights, of the form

mt+1 = at ’ bt RW .

This fact means that we can no longer transparently condition down. The statement that
£ ¤
1 = Et (at + bt RW )Rt+1

does not imply that we can ¬nd constants a and b so that
£ ¤
1 = E (a + bRW )Rt+1 .

Just try it. Taking unconditional expectations,
£ ¤ £ ¤
1 = E (at + bt Rt+1 )Rt+1 = E at Rt+1 + bt Rt+1 Rt+1

= E(at )E(Rt+1 ) + E(bt )E(RW Rt+1 ) + cov(at , Rt+1 ) + cov(bt , Rt+1 Rt+1 )

Thus, the unconditional model
£¡ ¢ ¤
1=E E(at ) + E(bt )Rt+1 Rt+1

only holds if the covariance terms above happen to be zero. Since at and bt are formed from
conditional moments of returns, the covariances will not, in general be zero.
On the other hand, suppose it is true that at and bt are constant over time. Then
£ ¤
1 = Et (a + bRW )Rt+1

does imply
£ ¤
1 = E (a + bRt+1 )Rt+1 ,

just like any other constant-parameter factor pricing model. Furthermore, the latter uncondi-
tional model implies the former conditional model, if the latter holds for all managed portfo-

8.3.2 Conditional vs. unconditional in an expected return / beta model

To put the same observation in beta-pricing language,
Et (Ri ) = Rt + β t »t (111)


does not imply that

E(Ri ) = ± + β» (112)

The reason is that β t and β represent conditional and unconditional regression coef¬cients
Again, if returns and factors are i.i.d., the unconditional model can go through. In that
case, cov(·) = covt (·), var(·) = vart (·), so the unconditional regression beta is the same
as the conditional regression beta, β = β t . Then, we can take expectations of (8.111) to get
(8.112), with » = E(»t ). But to condition down in this way, the covariance and variance must
each be constant over time. It is not enough that their ratio, or conditional betas are constant.
If covt and vart change over time, then the unconditional regression beta, β = cov/var is
not equal to the average conditional regression beta, E(β t ) or E(covt /vart ). Some models
specify that covt and vart vary over time, but covt /vart is a constant. This speci¬cation still
does not imply that the unconditional regression beta β ≡ cov/var is equal to the constant
covt /vart . Similarly, it is not enough that » be constant, since E(β t ) 6= β. The betas must
be regression coef¬cients, not just numbers.
If the betas do not vary over time, the »t may still vary and » = E(»t ).

8.3.3 A precise statement

Let™s formalize these observations somewhat. Let X denote the space of all portfolios of the
primitive assets, including managed portfolios in which the weights may depend on condi-
tioning information, i.e. scaled returns.
A conditional factor pricing model is a model mt+1 = at + b0 ft+1 that satis¬es pt =
Et+1 (mt+1 xt+1 ) for all xt+1 ∈ X.
An unconditional factor pricing model is model mt+1 = a + b0 ft+1 satis¬es E(pt ) =
E(mt+1 xt+1 ) for all xt+1 ∈ X. It might be more appropriately called a ¬xed-weight factor
pricing model.
Given these de¬nitions it™s almost trivial that the unconditional model is just a special
case of the conditional model, one that happens to have ¬xed weights. Thus, a conditional
factor model does not imply an unconditional factor model (because the weights may vary)
but an unconditional factor model does imply a conditional factor model.
There is one important subtlety. The payoff space X is common, and contains all managed
portfolios in both cases. The payoff space for the unconditional factor pricing model is not
just ¬xed combinations of a set of basis assets. For example, we might simply check that
the static (constant a, b) CAPM captures the unconditional mean returns of a set of assets. If
this model does not also price those assets scaled by instruments, then it is not a conditional
model, or, as I argued above, really a valid factor pricing model at all.
Of course, everything applies for the relation between a conditional factor pricing model


using a ¬ne information set (like investors™ information sets) and conditional factor pricing
models using coarser information sets (like ours). If you think a set of factors prices assets
with respect to investors™ information, that does not mean the same set of factors prices assets
with respect to our, coarser, information sets.

8.3.4 Mean-variance frontiers

De¬ne the conditional mean-variance frontier as the set of returns that minimize vart (Rt+1 )
given Et (Rt+1 ). (This de¬nition includes the lower segment as usual.) De¬ne the uncondi-
tional mean-variance frontier as the set of returns including managed portfolio returns that
minimize var(Rt+1 ) given E(Rt+1 ). These two frontiers are related by:
If a return is on the unconditional mean-variance frontier, it is on the conditional
mean-variance frontier.

If a return is on the conditional mean-variance frontier, it need not be on the uncon-
ditional mean-variance frontier.

These statements are exactly the opposite of what you ¬rst expect from the language. The
law of iterated expectations E(Et (x)) = E(x) leads you to expect that “conditional” should
imply “unconditional.” But we are studying the conditional vs. unconditional mean-variance
frontier, not raw conditional and unconditional expectations, and it turns out that exactly the
opposite words apply. Of course “unconditional” can also mean “conditional on a coarser
information set.”
Again, keep in mind that the unconditional mean variance frontier includes returns on
managed portfolios. This de¬nition is eminently reasonable. If you™re trying to minimize
variance for given mean, why tie your hands to ¬xed weight portfolios? Equivalently, why
not allow yourself to include in your portfolio the returns of mutual funds whose advisers
promise the ability to adjust portfolios based on conditioning information?
You could form a mean-variance frontier of ¬xed-weight portfolios of a basis set of assets,
and this is what many people often mean by “unconditional mean-variance frontier.” The re-
turn on the true unconditional mean-variance frontier will, in general, include some managed
portfolio returns, and so will lie outside this mean-variance frontier of ¬xed-weight portfolios.
Conversely, a return on the ¬xed-weight portfolio MVF is, in general, not on the uncondi-
tional or conditional mean-variance frontier. All we know is that the ¬xed-weight frontier lies
inside the other two. It may touch, but it need not. This is not to say the ¬xed-weight uncon-
ditional frontier is uninteresting. For example, returns on this frontier will price ¬xed-weight
portfolios of the basis assets. The point is that this frontier has no connection to the other two
frontiers. In particular, a conditionally mean-variance ef¬cient return (conditional CAPM)
need not unconditionally price the ¬xed weight portfolios.


I offer several ways to see this important statement.

Using the connection to factor models

We have seen that the conditional CAPM mt+1 = at ’ bt RW does not imply an uncon-
ditional CAPM mt+1 = a ’ bRt+1 . We have seen that the existence of such a conditional

factor model is equivalent to the statement that the return Rt+1 lies on the conditional mean-

variance frontier, and the existence of an unconditional factor model mt+1 = a ’ bRW is t+1
equivalent to the statement that RW is on the unconditional mean-variance frontier. Then,
from the “trivial” fact that an unconditional factor model is a special case of a conditional
one, we know that RW on the unconditional frontier implies RW on the conditional frontier
but not vice-versa.

Using the orthogonal decomposition

We can see the relation between conditional and unconditional mean-variance frontiers
using the orthogonal decomposition characterization of mean-variance ef¬ciency given above.
This beautiful proof is the main point of Hansen and Richard (1987).
By the law of iterated expectations, x— and R— generate expected prices and Re— generates
unconditional means as well as conditional means:

E [p = Et (x— x)] ’ E(p) = E(x— x)

£ ¤
E Et (R—2 ) = Et (R— R) ’ E(R—2 ) = E(R— R)

E [Et (Re— Re ) = Et (Re )] ’ E(Re— Re ) = E(Re )

This fact is subtle and important. For example, starting with x— = p0 Et (xt+1 x0 )’1 xt+1 ,
t t+1
you might think we need a different x— , R— , Re— to represent expected prices and uncon-
ditional means, using unconditional probabilities to de¬ne inner products. The three lines
above show that this is not the case. The same old x— , R— , Re— represent conditional as well
as unconditional prices and means.
Recall that a return is mean-variance ef¬cient if and only if it is of the form

Rmv = R— + wRe— .

Thus, Rmv is conditionally mean-variance ef¬cient if w is any number in the time t informa-
tion set.

conditional frontier: Rt+1 = Rt+1 + wt Re— ,
mv —

and Rmv is unconditionally mean-variance ef¬cient if w is any constant.

unconditional frontier: Rt+1 = Rt+1 + wRe— .
mv —


Constants are in the t information set; time t random variables are not necessarily constant.
Thus unconditional ef¬ciency (including managed portfolios) implies conditional ef¬ciency
but not vice versa. As with the factor models, once you see the decomposition, it is a trivial
argument about whether a weight is constant or time-varying.

Brute force and examples.

If you™re still puzzled, an additional argument by brute force may be helpful.
If a return is on the unconditional MVF it must be on the conditional MVF at each date.
If not, you could improve the unconditional mean-variance trade-off by moving to the con-
ditional MVF at each date. Minimizing unconditional variance given mean is the same as
minimizing unconditional second moment given mean,

min E(R2 ) s.t. E(R) = µ

Writing the unconditional moment in terms of conditional moments, the problem is
£ ¤
min E Et (R2 ) s.t. E [Et (R)] = µ

Now, suppose you could lower Et (R2 ) at one date t without affecting Et (R) at that date.
This change would lower the objective, without changing the constraint. Thus, you should
have done it: you should have picked returns on the conditional mean variance frontiers.
It almost seems that reversing the argument we can show that conditional ef¬ciency im-
plies unconditional ef¬ciency, but it doesn™t. Just because you have minimized Et (R2 ) for
given value of Et (R) at each date t does not imply that you have minimized E(R2 ) for a
given value of E(R). In showing that unconditional ef¬ciency implies conditional ef¬ciency
we held ¬xed Et (R) at each date at µ, and showed it is a good idea to minimize σ t (R). In
trying to go backwards, the problem is that a given value of E(R) does not specify what
Et (R) should be at each date. We can increase Et (R) in one conditioning information set
and decrease it in another, leaving the return on the conditional MVF.
Figure 22 presents an example. Return B is conditionally mean-variance ef¬cient. It also
has zero unconditional variance, so it is the unconditionally mean-variance ef¬cient return at
the expected return shown. Return A is on the conditional mean-variance frontiers, and has
the same unconditional expected return as B. But return A has some unconditional variance,
and so is inside the unconditional mean-variance frontier.
As a second example,the riskfree rate is only on the unconditional mean-variance frontier
if it is a constant. Remember the expression (6.95) for the risk free rate,

Rf = R— + Rf Re— .

The unconditional mean-variance frontier is R— + wRe— with w a constant. Thus, the riskfree
rate is only unconditionally mean-variance ef¬cient if it is a constant.


Et(R) Info. set 1

A Info. set 2



Figure 22. Return A is on the conditional mean-variance frontiers but not on the uncondi-
tional mean variance frontier.

8.3.5 Implications: Hansen-Richard Critique.

Many models, such as the CAPM, imply a conditional linear factor model mt+1 = at +
b0 ft+1 . These theorems show that such a model does not imply an unconditional model.
Equivalently, if the model predicts that the market portfolio is conditionally mean-variance
ef¬cient, this does not imply that the market is unconditionally mean-variance ef¬cient. We
often test the CAPM by seeing if it explains the average returns of some portfolios or (equiv-
alently) if the market is on the unconditional mean-variance frontier. The CAPM may quite
well be true (conditionally) and fail these tests; many assets may do better in terms of uncon-
ditional mean vs. unconditional variance.
The situation is even worse than these comments seem, and is not repaired by simple
inclusion of some conditioning information. Models such as the CAPM imply a conditional
linear factor model with respect to investors™ information sets. However, the best we can hope
to do is to test implications conditioned down on variables that we can observe and include
in a test. Thus, a conditional linear factor model is not testable!
I like to call this observation the “Hansen-Richard critique” by analogy to the “Roll Cri-
tique.” Roll pointed out, among other things, that the wealth portfolio might not be observ-
able, making tests of the CAPM impossible. Hansen and Richard point out that the condi-
tioning information of agents might not be observable, and that one cannot omit it in testing a
conditional model. Thus, even if the wealth portfolio were observable, the fact that we cannot
observe agents™ information sets dooms tests of the CAPM.


8.4 Scaled factors: a partial solution

You can expand the set of factors to test conditional factor pricing models

factors = ft+1 — zt

The problem is that the parameters of the factor pricing model mt+1 = at + bt ft+1 may
vary over time. A partial solution is to model the dependence of parameters at and bt on
variables in the time’t information set; let at = a(zt ), bt = b(zt ) where zt is a vector of
variables observed at time t (including a constant). In particular, why not try linear models

at = a0 z t , bt = b0 z t

Linearity is not restrictive: zt is just another instrument. The only criticism one can make

is that some instrument zjt is important for capturing the variation in at and bt , and was
omitted. For instruments on which we have data, we can meet this objection by trying zjt
and seeing whether it does, in fact, enter signi¬cantly. However, for instruments zt that are
observed by agents but not by us, this criticism remains valid.
Linear discount factor models lead to a nice interpretation as scaled factors, in the same
way that linearly managed portfolios are scaled returns. With a single factor and instrument,

mt = a(zt ) + b(zt )ft+1

= a0 + a1 zt + (b0 + b1 zt )ft+1

= a0 + a1 zt + b0 ft+1 + b1 (zt ft+1 ) .

Thus, in place of the one-factor model with time-varying coef¬cients (8.113), we have a
three-factor model (zt , ft+1 , zt ft+1 ) with ¬xed coef¬cients, (8.114).
Since the coef¬cients are now ¬xed, we can use the scaled-factor model with uncondi-
tional moments.

pt = Et [(a0 + a1 zt + b0 ft+1 + b1 (zt ft+1 )) xt+1 ] ’

E(pt ) = E [(a0 + a1 zt + b0 ft+1 + b1 (zt ft+1 )) xt+1 ]

For example, in standard derivations of CAPM, the market (wealth portfolio) return is
conditionally mean-variance ef¬cient; investors want to hold portfolios on the conditional


mean-variance frontier; conditionally expected returns follow a conditional single-beta rep-
resentation, or the discount factor m follows a conditional linear factor model
mt+1 = at ’ bt Rt+1

as we saw above.
But none of these statements mean that we can use the CAPM unconditionally. Rather
than throw up our hands, we can add some scaled factors. Thus, if, say, the dividend/price ra-
tio and term premium do a pretty good job of summarizing variation in conditional moments,
the conditional CAPM implies an unconditional, ¬ve-factor (plus constant) model. The fac-
tors are a constant, the market return, the dividend/price ratio, the term premium, and the
market return times the dividend-price ratio and the term premium.
The unconditional pricing implications of such a ¬ve-factor model could, of course, be
summarized by a single’β representation. (See the caustic comments in the section on im-
plications and equivalence.) The reference portfolio would not be the market portfolio, of
course, but a mimicking portfolio of the ¬ve factors. However, the single mimicking port-
folio would not be easily interpretable in terms of a single factor conditional model and two
instruments. In this case, it might be more interesting to look at a multiple ’β or multiple-
factor representation.
If we have many factors f and many instruments z, we should in principle multiply every
factor by every instrument,

m = b1 f1 + b2 f1 z1 + b3 f1 z2 + ... + bN+1 f2 + bN+2 f2 z1 + bN+3 f2 z2 + ...

This operation can be compactly summarized with the Kronecker product notation, a — b,
which means “multiply every element in vector a by every element in vector b, or

mt+1 = b0 (ft+1 — zt ).

8.5 Summary

When you ¬rst think about it, conditioning information sounds scary “ how do we account for
time-varying expected returns, betas, factor risk premia, variances, covariances, etc. How-
ever, the methods outlined in this chapter allow a very simple and beautiful solution to the
problems raised by conditioning information. To express the conditional implications of a
given model, all you have to do is include some scaled or managed portfolio returns, and then
pretend you never heard about conditioning information.
Some factor models are conditional models, and have coef¬cients that are functions of
investors™ information sets. In general, there is no way to test such models, but if you are
willing to assume that the relevant conditioning information is well summarized by a few
variables, then you can just add new factors, equal to the old factors scaled by the conditioning


variables, and again forget that you ever heard about conditioning information.
You may want to remember conditioning information as a diagnostic and in economic
interpretation of the results. It may be interesting to take estimates of a many factor model,
mt = a0 + a1 zt + b0 ft+1 + b1 zt ft+1 , and see what they say about the implied conditional
model, mt = (a0 + a1 zt ) + (b0 + b1 zt )ft+1 . You may want to make plots of conditional
bs, betas, factor risk premia, expected returns,etc. But you don™t have to worry about it in
estimation and testing.

8.6 Problems

1. If there is a risk free asset, is it on the a) conditional b) unconditional c) both
mean-variance frontier?
2. If there is a conditionally riskfree asset “ a claim to 1 is traded at each date, does this
mean that there is an unconditionally risk free asset? (De¬ne the latter ¬rst!) How about
vice versa?
3. Suppose you took the unconditional population moments E(R), E(RR0 ) of assets
returns and constructed the mean-variance frontier. Does this frontier correspond to the
conditional or the unconditional MV frontier, or neither? What is the key assumption
underlying your answer?

Chapter 9. Factor pricing models
In Chapter 2, I noted that the consumption-based model, while a complete answer to most
asset pricing questions in principle, does not (yet) work well in practice. This observation
motivates efforts to tie the discount factor m to other data. Linear factor pricing models are
the most popular models of this sort in ¬nance. They dominate discrete time empirical work.
Factor pricing models replace the consumption-based expression for marginal utility
growth with a linear model of the form

mt+1 = a + b0 f t+1

a and b are free parameters. This speci¬cation is equivalent to a multiple-beta model

E(Rt+1 ) = ± + β 0 »

where β are multiple regression coef¬cients of returns R on the factors f . Here, ± and » are
the free parameters.
The big question is, what should one use for factors ft+1 ? Factor pricing models look for
variables that are good proxies for aggregate marginal utility growth, i.e., variables for which

u0 (ct+1 )
≈ a + b0 f t+1 (115)
u (ct )

is a sensible and economically interpretable approximation.
More directly and interpretably, the essence of asset pricing is that there are special states
of the world in which investors are especially concerned that their portfolios not do badly.
They are willing to trade off some overall performance “ average return “ to make sure that
portfolios do not do badly in these particular states of nature. The factors are variables that
indicate that these “bad states” have occurred.
The factors that result from this search are and should be intuitively sensible. In any
sensible economic model, as well as in the data, consumption is related to returns on broad-
based portfolios, to interest rates, to growth in GNP, investment, or other macroeconomic
variables, and to returns on production processes. All of these variables measure “wealth”
or the state of the economy. Consumption is and should be high in “good times” and low in
“bad times.”
Furthermore, consumption and marginal utility respond to news: if a change in some
variable today signals high income in the future, then consumption rises now, by permanent
income logic. This fact opens the door to forecasting variables: any variable that forecasts
asset returns (“changes in the investment opportunity set”) or macroeconomic variables is a
candidate factor. Variables such as the term premium, dividend/price ratio, stock returns, etc.
can be defended as pricing factors on this logic. Though they themselves are not measures of
aggregate good or bad times, they forecast such times.


Should factors be independent over time? The answer is, sort of. If there is a constant
real interest rate, then marginal utility growth should be unpredictable. (“Consumption is a
random walk” in the quadratic utility permanent income model.) To see this, just look at the
¬rst order condition with a constant interest rate,

u0 (ct ) = βRf Et [u0 (ct+1 )]

or in a more time-series notation,

u0 (ct+1 ) 1
= + µt+1 ; Et (µt+1 ) = 0.
u0 (ct ) βRf

The real risk free rate is not constant, but it does not vary a lot, especially compared to as-
set returns. Measured consumption growth is not exactly unpredictable but it is the least
predictable macroeconomic time series, especially if one accounts properly for temporal ag-
gregation (consumption data are quarterly averages). Thus, factors that proxy for marginal
utility growth, though they don™t have to be totally unpredictable, should not be highly pre-
dictable. If one chooses highly predictable factors, the model will counterfactually predict
large interest rate variation.
In practice, this consideration means that one should choose the right units: Use GNP
growth rather than level, portfolio returns rather than prices or price/dividend ratios, etc.
However, unless one wants to impose an exactly constant risk free rate, one does not have to
¬lter or prewhiten factors to make them exactly unpredictable.
This view of factors as intuitively motivated proxies for marginal utility growth is suf¬-
cient to carry the reader through current empirical tests of factor models. The extra constraints
of a formal exposition of theory in this part have not yet constrained the factor-¬shing expe-
The precise derivations all proceed in the way I have motivated factor models: One writes
down a general equilibrium model, in particular a speci¬cation of the production technology
by which real investment today results in real output tomorrow. This general equilibrium
produces relations that express the determinants of consumption from exogenous variables,
and relations linking consumption and other endogenous variables; equations of the form
ct = g(ft ). One then uses this kind of equation to substitute out for consumption in the basic
¬rst order conditions.
The formal derivations accomplish two things: they determine one particular list of factors
that can proxy for marginal utility growth, and they prove that the relation should be linear.
Some assumptions can often be substituted for others in the quest for these two features of a
factor pricing model.
This is a point worth remembering: all factor models are derived as specializations of the
consumption-based model. Many authors of factor model papers disparage the consumption-
based model, forgetting that their factor model is the consumption-based model plus extra
assumptions that allow one to proxy for marginal utility growth from some other variables.


My presentation follows Constantinides™ (1989) derivation of traditional models as instances
of the consumption-based model in this regard.
Above, I argued that clear economic foundation was important for factor models, since it
is the only guard against ¬shing. Alas, we discover here that the current state of factor pricing
models is not a particularly good guard against ¬shing. One can call for better theories or
derivations, more carefully aimed at limiting the list of potential factors and describing the
fundamental macroeconomic sources of risk, and thus providing more discipline for empirical
work. The best minds in ¬nance have been working on this problem for 40 years though, so
a ready solution is not immediately in sight. On the other hand, we will see that even current
theory can provide much more discipline than is commonly imposed in empirical work. For
example, the derivations of the CAPM and ICAPM do leave predictions for the risk free rate
and for factor risk premia that are often ignored. The ICAPM gives tighter restrictions on
state variables than are commonly checked: “State variables” do have to forecast something!
We also see how special and unrealistic are the general equilibrium setups necessary to derive
popular speci¬cations such as CAPM and ICAPM. This observation motivates a more serious
look at real general equilibrium models below.

9.1 Capital Asset Pricing Model (CAPM)

The CAPM is the model m = a + bRw ; Rw = wealth portfolio return. I derive it from
the consumption based model by 1) Two period quadratic utility; 2) Two periods, exponential
utility and normal returns; 3) In¬nite horizon, quadratic utility and i.i.d. returns; 4) Log utility
and normally distributed returns.

The CAPM is the ¬rst, most famous and (so far) most widely used model in asset pricing.
It ties the discount factor m to the return on the “wealth portfolio.” The function is linear,
mt+1 = a + bRt+1 .

a and b are free parameters. One can ¬nd theoretical values for the parameters a and b by
requiring the discount factor m to price any two assets, such as the wealth portfolio return
and risk-free rate, 1 = E(mRW ) and 1 = E(m)Rf . (As an example, we did this in equation
(8.110) above.) In empirical applications, we can also pick a and b to “best” price larger
cross-sections of assets. We do not have good data on, or even a good empirical de¬nition
for, the return on total wealth. It is conventional to proxy RW by the return on a broad-based
stock portfolio such as the value- or equally-weighted NYSE, S&P500, etc.
The CAPM is of course most frequently stated in equivalent expected return / beta lan-

E(Ri ) = ± + β i,RW [E(Rw ) ’ ±] .


This section brie¬‚y describes some classic derivations of the CAPM. Again, we need
to ¬nd assumptions that defend which factors proxy for marginal utility (RW here), and
assumptions to defend the linearity between m and the factor.
I present several derivations of the same model. Many of these derivations use classic
modeling assumptions which are important in their own sake. This is also an interesting place
in which to see that various sets of assumptions can often be used to get to the same place.
The CAPM is often criticized for one or another assumption. By seeing several derivations,
we can see how one assumption can be traded for another. For example, the CAPM does not
in fact require normal distributions, if one is willing to swallow quadratic utility instead.

9.1.1 Two-period quadratic utility

Two period investors with no labor income and quadratic utility imply the CAPM.

Investors have quadratic preferences and only live two periods,
1 1
U(ct , ct+1 ) = ’ (ct ’ c— )2 ’ βE[(ct+1 ’ c— )2 ]. (116)
2 2
Their marginal rate of substitution is thus

u0 (ct+1 ) (ct+1 ’ c— )
mt+1 = β =β .
u0 (ct ) (ct ’ c— )

The quadratic utility assumption means marginal utility is linear in consumption. Thus, the
¬rst target of the derivation, linearity.
Investors are born with wealth Wt in the ¬rst period and earn no labor income. They
can invest in lots of assets with prices pi and payoffs xi , or, to keep the notation simple,
t t+1
returns Rt+1 . They choose how much to consume at the two dates, ct and ct+1 , and the

portfolio weights ±i for their investment portfolio. Thus, the budget constraint is

ct+1 = Wt+1

Wt+1 = RW (Wt ’ ct )

W i
R = ±i R ; ±i = 1.
i=1 i=1

RW is the rate of return on total wealth.


The two-period assumption means that investors consume everything in the second pe-
riod, by constraint (9.117). This fact allows us to substitute wealth and the return on wealth
for consumption, achieving the second goal of the derivation, naming the factor that proxies
for consumption or marginal utility:
Rt+1 (Wt ’ ct ) ’ c—
’βc— β(Wt ’ ct ) W
mt+1 =β = + Rt+1
ct ’ c— ct ’ c— ct ’ c—

mt+1 = at + bt RW .

9.1.2 Exponential utility, normal distributions

u(c) = ’e’±c and a normally distributed set of returns also produces the CAPM.

The combination of exponential utility and normal distributions is another set of assump-
tions that deliver the CAPM in a one or two period model. This structure has a particularly
convenient analytical form. Since it gives rise to linear demand curves, it is very widely
used in models that complicate the trading structure, by introducing incomplete markets or
asymmetric information.
I present a model with consumption only in the last period. (You can do the quadratic
utility model of the last section this way as well.) Utility is
£ ¤
E [u(c)] = E ’e’±c .
± is known as the coef¬cient of absolute risk aversion. If consumption is normally distributed,
we have
σ 2 (c)
Eu(c) = ’e’±E(c)+ .

Suppose this investor has initial wealth W which can be split between a riskfree asset
paying Rf and a set of risky assets paying return R. Let y denote the amount of this wealth
W (amount, not fraction) invested in each security. Then, the budget constraint is

c = y f Rf + y 0 R
W = y f + y0 1

Plugging the ¬rst constraint into the utility function we obtain
Rf +y 0 E(R)]+ ± y 0 Σy
Eu(c) = ’e’±[y (118)


As with quadratic utility, the two-period model is what allows us to set consumption to wealth
and then substitute the return on the wealth portfolio for consumption growth in the discount
Maximizing (9.118) with respect to y, y f , we obtain the ¬rst order condition describing
the optimal amount to be invested in the risky asset,

E(R) ’ Rf
y = Σ’1
Sensibly, the investor invests more in risky assets if their expected return is higher, less if his
risk aversion coef¬cient is higher, and less if the assets are riskier. Notice that total wealth
does not appear in this expression. With this setup, the amount invested in risky assets is
independent of the level of wealth. This is why we say that this investor has an aversion to
absolute rather than relative (to wealth) risk aversion. Note also that these “demands” for the
risky assets are linear in expected returns, which is a very convenient property.
Inverting the ¬rst order conditions, we obtain

E(R) ’ Rf = ±Σy = ± cov(R, Rm ). (119)


. 5
( 17)