£ ¤

cov(Ri , Rmv ) = cov (R— + wRe— ) , (R— + wi Re— )

= var(R— ) + wwi var(Re— ) ’ (w + wi )E(R— )E(Re— )

= var(R— ) ’ wE(R— )E(Re— ) + wi [w var(Re— ) ’ E(R— )E(Re— )]

Thus, cov(Ri , Rmv ) and E(Ri ) are both linear functions of wi . We can solve

cov(Ri , Rmv ) for wi , plug into the expression for E(Ri ) and we™re done.

To do this, of course, we must be able to solve cov(Ri , Rmv ) for wi . This requires

E(R— )E(Re— ) E(R— )E(Re— ) E(R— )

(107)

w 6= = =

var(Re— ) E(Re—2 ) ’ E(Re— )2 1 ’ E(Re— )

¥

which is the condition for the minimum variance return.

6.7 Problems

1. In the argument that Rmv on the mean variance frontier, Rmv = R— + wRe— , implies a

discount factor m = a + bRmv , do we have to rule out the case of risk neutrality? (Hint:

What is Re— when the economy is risk-neutral?)

2. If you use factor mimicking portfolios as in (6.93), you know that the predictions for

expected returns are the same as they are if you use the factors themselves . Are the ±— ,

118

SECTION 6.7 PROBLEMS

»— , and β — for the factor mimicking portfolio representation the same as the original ±,

», and β of the factor pricing model?

3. Suppose the CAPM is true, m = a ’ bRm prices a set of assets, and there is a risk-free

rate Rf . Find R— in terms of the moments of Rm , Rf .

4. If you express the mean-variance frontier as a linear combination of factor-mimicking

portfolios from a factor model, do the relative weights of the various factor portfolios in

the mean-variance ef¬cient return change as you sweep out the frontier, or do they stay

the same? (Start with the riskfree rate case)

5. For an arbitrary mean-variance ef¬cient return of the form R— + wRe— , ¬nd its zero-beta

return and zero-beta rate. Show that your rate reduces to the riskfree rate when there is

one.

6. When the economy is risk neutral, and if there is no risk-free rate, show that the

zero-beta, minimum-variance, and constant-mimicking portfolio returns are again all

equivalent, though not equal to the risk-free rate. (In this case, the mean-variance frontier

is just the minimum-variance point.)

119

Chapter 7. Implications of existence and

equivalence theorems

Existence of a discount factor means p = E(mx) is innocuous, and all content ¬‚ows from

the discount factor model.

The theorems apply to sample moments too; the dangers of ¬shing up ex-post or sample

mean-variance ef¬cient portfolios.

Sources of discipline in factor ¬shing expeditions.

The joint hypothesis problem. How ef¬ciency tests are the same as tests of economic

discount factor models.

Factors vs. their mimicking portfolios.

Testing the number of factors.

Plotting contingent claims on the axis vs. mean and variance.

The theorems on the existence of a discount factor, and the equivalence between the p =

E(mx), expected return - beta, and mean-variance views of asset pricing have important

implications for how we approach and evaluate empirical work.

The equivalence theorems are obviously important, especially to the theme of this book,

to show that the choice of discount factor language versus expected return-beta language

or mean-variance frontier is entirely one of convenience. Nothing in the more traditional

statements is lost.

p = E(mx) is innocuous

Before Roll (1976), expected return “ beta representations had been derived in the con-

text of special and explicit economic models, especially the CAPM. In empirical work, the

success of any expected return - beta model seemed like a vindication of the whole structure.

The fact that, for example, one might use the NYSE value-weighted index portfolio in place

of the return on total wealth predicted by the CAPM seemed like a minor issue of empirical

implementation.

When Roll showed that mean-variance ef¬ciency implies a single beta representation,

all that changed. Some single beta representation always exists, since there is some mean-

variance ef¬cient return. The asset pricing model only serves to predict that a particular

return (say, the “market return”) will be mean-variance ef¬cient. Thus, if one wants to “test

the CAPM” it becomes much more important to be choosy about the reference portfolio, to

guard against stumbling on something that happens to be mean-variance ef¬cient and hence

prices assets by construction.

120

This insight led naturally to the use of broader wealth indices (Stambaugh 1982) in the

reference portfolio to provide a more grounded test of the CAPM. However, this approach

has not caught on. Stocks are priced with stock factors, bonds with bond factors, and so on.

More recently, stocks sorted on size, book/market, and past performance characteristics are

priced by portfolios sorted on those characteristics. Part of the reason for this is that the betas

are small; stocks and bonds are not highly correlated so risk premia from one source of betas

have small impacts on another set of average returns. Larger measures of wealth including

human capital and real estate do not come with high frequency price data, so adding them to

a wealth portfolio has little effect on betas.

The good news in this existence theorem is that you can always start by writing an ex-

pected return-beta model, knowing that you have imposed almost no structure in doing so.

The bad news is that you haven™t gotten very far. All the economic, statistical and predictive

content comes in picking the factors.

The theorem that, from the law of one price, there exists some discount factor m such

that p = E(mx) is just an updated restatement of Roll™s theorem. The content is all in

m = f (data) not in p = E(mx). Again, an asset pricing framework that initially seemed to

require a lot of completely unbelievable structure“the representative consumer consumption-

based model in complete frictionless markets“turns out to require (almost) no structure at all.

Again, the good news is that you can always start by writing p = E(mx), and need not suffer

criticism about hidden contingent claim or representative consumer assumptions in so doing.

The bad news is that you haven™t gotten very far by writing p = E(mx) as all the economic,

statistical and predictive content comes in picking the discount factor model m = f (data).

Ex-ante and ex-post.

I have been deliberately vague about the probabilities underlying expectations and other

moments in the theorems. The fact is, the theorems hold for any set of probabilities4 . Thus,

the existence and equivalence theorems work equally well ex-ante as ex-post: E(mx), β, E(R)

and so forth can refer to agent™s subjective probability distributions, objective population

probabilities, or to the moments realized in a given sample.

Thus, if the law of one price holds in a sample, one may form an x— from sample moments

that satis¬es p(x) = E(x— x), exactly, in that sample, where p(x) refers to observed prices

and E(x— x) refers to the sample average. Equivalently, if the sample covariance matrix of

a set of returns is nonsingular, there exists an ex-post mean-variance ef¬cient portfolio for

which sample average returns line up exactly with sample regression betas.

This observation points to a great danger in the widespread exercise of searching for and

statistically evaluating ad-hoc asset pricing models. Such models are guaranteed empirical

success in a sample if one places little enough structure on what is included in the discount

factor function. The only reason the model doesn™t work perfectly is the restrictions the re-

searcher has imposed on the number or identity of the factors included in m, or the parameters

of the function relating the factors to m. Since these restrictions are the entire content of the

Precisely, any set of probabilities that agree on impossible (zero-probability) events.

4

121

CHAPTER 7 IMPLICATIONS OF EXISTENCE AND EQUIVALENCE THEOREMS

model, they had better be interesting, carefully described and well motivated!

Obviously, this is typically not the case or I wouldn™t be making such a fuss about it. Most

empirical asset pricing research posits an ad-hoc pond of factors, ¬shes around a bit in that

set, and reports statistical measures that show “success,” in that the model is not statistically

rejected in pricing an ad-hoc set of portfolios. The set of discount factors is usually not large

enough to give the zero pricing errors we know are possible, yet the boundaries are not clearly

de¬ned.

Discipline

What is wrong, you might ask, with ¬nding an ex-post ef¬cient portfolio or x— that prices

assets by construction? Perhaps the lesson we should learn from the existence theorems is to

forget about economics, the CAPM, marginal utility and all that, and simply price assets with

ex-post mean variance ef¬cient portfolios that we know set pricing errors to zero!

The mistake is that a portfolio that is ex-post ef¬cient in one sample, and hence prices

all assets in that sample, is unlikely to be mean-variance ef¬cient, ex-ante or ex-post, in the

next sample, and hence is likely to do a poor job of pricing assets in the future. Similarly,

the portfolio x— = p0 E(xx0 )’1 x (using the sample second moment matrix) that is a discount

factor by construction in one sample is unlikely to be a discount factor in the next sample;

the required portfolio weights p0 E(xx0 )’1 change, often drastically, from sample to sample.

For example, suppose the CAPM is true, the market portfolio is ex-ante mean-variance ef-

¬cient, and sets pricing errors to zero if you use true or subjective probabilities. Nonetheless,

the market portfolio is unlikely to be ex-post mean-variance ef¬cient in any given sample. In

any sample, there will be lucky winners and unlucky losers. An ex-post mean variance ef-

¬cient portfolio will be a Monday-morning quarterback; it will tell you to put large weights

on assets that happened to be lucky in a given sample, but are no more likely than indicated

by their betas to generate high returns in the future. “Oh, if I had only bought Microsoft in

1982...” is not a useful guide to forming a mean-variance ef¬cient portfolio today. (In fact,

mean-reversion in the market and book/market effects in individual stocks suggest that if

anything, assets with unusually good returns in the past are likely to do poorly in the future!)

The only solution is to impose some kind of discipline in order to avoid dredging up

spuriously good in-sample pricing.

The situation is the same as in traditional regression analysis. Regressions are used to

forecast or to explain a variable y by other variables x in a regression y = x0 β + µ. By

blindly including right hand variables, one can produce models with arbitrarily good statis-

tical measures of ¬t. But this kind of model is typically unstable out of sample or otherwise

useless for explanation or forecasting. One has to carefully and thoughtfully limit the search

for right hand variables x to produce good models.

What makes for an interesting set of restrictions? Econometricians wrestling with y =

x β + µ have been thinking about this question for about 50 years, and the best answers

0

are 1) use economic theory to carefully specify the right hand side and 2) use a battery of

122

cross-sample and out-of-sample stability checks.

Alas, this advice is hard to follow. Economic theory is usually either silent on what

variables to put on the right hand side of a regression, or allows a huge range of variables.

The same is true in ¬nance. “What are the fundamental risk factors?” is still an unanswered

question. At the same time one can appeal to the APT and ICAPM to justify the inclusion of

just about any desirable factor (Fama 1991 calls these theories a “¬shing license.”) Thus, you

will grow old waiting for theorists to provide useful answers to this kind of question.

Following the purely statistical advice, the battery of cross-sample and out-of-sample

tests often reveals the model is unstable, and needs to be changed. Once it is changed, there

is no more out-of-sample left to check it. Furthermore, even if one researcher is pure enough

to follow the methodology of classical statistics, and wait 50 years for another fresh sample

to be available before contemplating another model, his competitors and journal editors are

unlikely to be so patient. In practice, then, out of sample validation is not a strong guard

against ¬shing.

Nonetheless, these are the only standards we have to guard against ¬shing. In my opinion,

the best hope for ¬nding pricing factors that are robust out of sample and across different

markets, is to try to understand the fundamental macroeconomic sources of risk. By this I

mean, tying asset prices to macroeconomic events, in the way the ill-fated consumption based

model does via mt+1 = βu0 (ct+1 )/u0 (ct ). The dif¬culties of the consumption-based model

have made this approach lose favor in recent years. However, the alternative approach is also

running into trouble that the number and identity of empirically-determined risk factors does

not seem stable. Every time a new anomaly or data set pops up, a new set of ad-hoc factors

gets created to explain them!

In any case, one should always ask of a factor model, “what is the compelling economic

story that restricts the range of factors used?” and / or “what statistical restraints are used”

to keep from discovering ex-post mean variance ef¬cient portfolios, or to ensure that the

results will be robust across samples. The existence theorems tell us that the answers to these

questions are the only content of the exercise. If the purpose of the model is not just to predict

asset prices but also to explain them, this puts an additional burden on economic motivation

of the risk factors.

There is a natural resistance to such discipline built in to our current statistical method-

ology for evaluating models (and papers). When the last author ¬shed around and produced

an ad-hoc factor pricing model that generates 1% average pricing errors, it is awfully hard

to persuade readers, referees, journal editors, and clients that your economically motivated

factor pricing model is better despite 2% average pricing errors. Your model may really be

better and will therefore continue to do well out of sample when the ¬shed model falls by

the wayside of ¬nancial fashion, but it is hard to get past statistical measures of in-sample ¬t.

One hungers for a formal measurement of the number of hurdles imposed on a factor ¬shing

¯

expedition, like the degrees of freedom correction in R2 . Absent a numerical correction, we

have to use judgment to scale back apparent statistical successes by the amount of economic

and statistical ¬shing that produced them.

123

CHAPTER 7 IMPLICATIONS OF EXISTENCE AND EQUIVALENCE THEOREMS

Mimicking portfolios

The theorem x— = proj(m|X) also has interesting implications for empirical work. The

pricing implications of any model can be equivalently represented by its factor-mimicking

portfolio. If there is any measurement error in a set of economic variables driving m, the

factor-mimicking portfolios for the true m will price assets better than an estimate of m that

uses the measured macroeconomic variables.

Thus, it is probably not a good idea to evaluate economically interesting models with

statistical horse races against models that use portfolio returns as factors. Economically in-

teresting models, even if true and perfectly measured, will just equal the performance of their

own factor-mimicking portfolios, even in large samples. They will always lose in sample

against ad-hoc factor models that ¬nd nearly ex-post ef¬cient portfolios.

This said, there is an important place for models that use returns as factors. After we

have found the underlying true macro factors, practitioners will be well advised to look at

the factor-mimicking portfolio on a day-by-day basis. Good data on the factor-mimicking

portfolios will be available on a minute-by-minute basis. For many purposes, one does not

have to understand the economic content of a model.

But this fact does not tell us to circumvent the process of understanding the true macroe-

conomic factors by simply ¬shing for factor-mimicking portfolios. The experience of practi-

tioners who use factor models seems to bear out this advice. Large commercial factor models

resulting from extensive statistical analysis (otherwise known as ¬shing) perform poorly out

of sample, as revealed by the fact that the factors and loadings (β) change all the time.

Also models speci¬ed with economic fundamentals will always seem to do poorly in

a given sample against ad-hoc variables (especially if one ¬shes an ex-post mean-variance

ef¬cient portfolio out of the latter!). But what other source of discipline do we have?

Irrationality and Joint Hypothesis

Finance contains a long history of ¬ghting about “rationality” vs. “irrationality” and

“ef¬ciency” vs. “inef¬ciency” of asset markets. The results of many empirical asset pricing

papers are sold as evidence that markets are “inef¬cient” or that investors are “irrational.” For

example, the crash of October 1987, and various puzzles such as the small-¬rm, book/market,

seasonal effects or long-term predictability have all been sold this way.

However, none of these puzzles documents an arbitrage opportunity5 . Therefore, we

know that there is a “rational model”“a stochastic discount factor, an ef¬cient portfolio to use

in a single-beta representation”that rationalizes them all. And we can con¬dently predict

this situation to continue; real arbitrage opportunities do not last long! Fama (1970) contains

a famous statement of the same point. Fama emphasized that any test of “ef¬ciency” is a joint

test of ef¬ciency and a “model of market equilibrium.” Translated, an asset pricing model, or

a model of m.

The closed-end fund puzzle comes closest since it documents an apparent violation of the law of one price.

5

However, you can™t costlessly short closed end funds, and we have ignored short sales constraints so far.

124

But surely markets can be “irrational” or “inef¬cient” without requiring arbitrage oppor-

tunities? Yes, they can, if (and only if) the discount factors that generate asset prices are

disconnected from marginal rates of substitution or transformation in the real economy. But

now we are right back to specifying and testing economic models of the discount factor! At

best, an asset pricing puzzle might be so severe that we can show that the required discount

factors are completely “unreasonable” (by some standard) measures of real marginal rates of

substitution and/or transformation, but we still have to say something about what a reasonable

marginal rate looks like.

In sum, the existence theorems mean that there are no quick proofs of “rationality” or

“irrationality.” The only game in town for the purpose of explaining asset prices is thinking

about economic models of the discount factor.

The number of factors.

Many asset pricing tests focus on the number of factors required to price a cross-section

of assets. The equivalence theorems imply that this is a silly question. A linear factor model

m = b0 f or its equivalent expected return / beta model E(Ri ) = ± + β 0 »f are not unique

if

representations. In particular, given any multiple-factor or multiple-beta representation we

can easily ¬nd a single-beta representation. The single factor m = b0 f will price assets

just as well as the original factors f, as will x— = proj(b0 f | X) or the corresponding

R— . All three options give rise to single-beta models with exactly the same pricing ability as

the multiple factor model. We can also easily ¬nd equivalent representations with different

numbers (greater than one) of factors. For example, write

µ ¶

b3 ˆ

m = a + b1 f1 + b2 f2 + b3 f3 = a + b1 f1 + b2 f2 + f3 = a + b1 f1 + b2 f2

b2

to reduce a “three factor” model to a “two factor” model. In the ICAPM language, consump-

tion itself could serve as a single state variable, in place of the S state variables presumed to

drive it.

There are times when one is interested in a multiple factor representation. Sometimes the

factors have an economic interpretation that is lost on taking a linear combination. But the

pure number of pricing factors is not a meaningful question.

Discount factors vs. mean, variance and beta.

The point of the previous chapter was to show how the discount factor, mean-variance,

and expected return- beta models are all equivalent representations of asset pricing. It seems

a good moment to contrast them as well; to understand why the mean-variance and beta

language developed ¬rst, and to think about why the discount factor language seems to be

taking over.

Asset pricing started by putting mean and variance of returns on the axes, rather than

payoff in state 1 payoff in state 2, etc. as we do now. The early asset pricing theorists posed

the question just right: they wanted to treat assets in the apples-and-oranges, indifference

125

CHAPTER 7 IMPLICATIONS OF EXISTENCE AND EQUIVALENCE THEOREMS

curve and budget set framework of macroeconomics. The problem was, what labels to put

on the axis? Clearly, “IBM stock” and “GM stock” is not a good idea; investors do not

value securities per se, but value some aspects of the stream of random cash ¬‚ows that those

securities give rise to.

Their brilliant insight was to put the mean and variance of the portfolio return on the axis;

to treat these as “hedonics” by which investors valued their portfolios. Investors plausibly

want more mean and less variance. They gave investors “utility functions” de¬ned over this

mean and variance, just as standard utility functions are de¬ned over apples and oranges. The

mean-variance frontier is the “budget set.”

With this focus on portfolio mean and variance, the next step was to realize that each

security™s mean return measures its contribution to the portfolio mean, and that regression

betas on the overall portfolio give each security™s contribution to the portfolio variance. The

mean-return vs. beta description for each security followed naturally.

In a deep sense, the transition from mean-variance frontiers and beta models to discount

factors represents the realization that putting consumption in state 1 and consumption in

state 2 on the axes ” specifying preferences and budget constraints over state-contingent

consumption ” is a much more natural mapping of standard microeconomics into ¬nance

than putting mean, variance, etc. on the axes. If for no other reason, the contingent claim

budget constraints are linear, while the mean-variance frontier is not. Thus, I think, the focus

on means and variance, the mean-variance frontier and expected return/beta models is all

due to an accident of history, that the early asset pricing theorists happened to put mean and

variance on the axes rather than state contingent consumption.

Well, here we are, why prefer one language over another? The discount factor language

has an advantage for its simplicity, generality, mathematical convenience, and elegance.

These virtues are to some extent in the eye of the beholder, but to this beholder, it is in-

spiring to be able to start every asset pricing calculation with one equation, p = E(mx).

This equation covers all assets, including bonds, options, and real investment opportunities,

while the expected return/beta formulation is not useful or very cumbersome in the latter ap-

plications. Thus, it has seemed that there are several different asset pricing theories: expected

return/beta for stocks, yield-curve models for bonds, arbitrage models for options. In fact all

three are just cases of p = E(mx). As a particular example, arbitrage, in the precise sense

of positive payoffs with negative prices, has not entered the equivalence discussion at all. I

don™t know of any way to cleanly graft absence of arbitrage on to expected return/beta mod-

els. You have to tack it on after the fact “ “by the way, make sure that every portfolio with

positive payoffs has a positive price.” It is trivially easy to graft it on to a discount factor

model: just add m > 0.

The discount factor and state space language also makes it easier to think about different

horizonsP the present value statement of models. p = E(mx) generalizes quickly to

and

pt = Et j mt,t+j xt+j , while returns have to be chained together to think about multiperiod

models. Papers are still written arguing about geometric vs. arithmetic average returns for

multiperiod discounting.

126

The choice of language is not about normality or return distributions. There is a lot of

confusion about where return distribution assumptions show up in ¬nance. I have made no

distributional assumptions in any of the discussion so far. Second moments as in betas and

the variance of the mean-variance frontier show up because p = E(mx) involves a second

moment. One does not need to assume normality to talk about the mean-variance frontier.

Returns on the mean-variance frontier price other assets even when returns are not normally

distributed.

127

Chapter 8. Conditioning information

The asset pricing theory I have sketched so far really describes prices at time t in terms of

conditional moments. The investor™s ¬rst order conditions are

pt u0 (ct ) = βEt [u0 (ct+1 )xt+1 ]

where Et means expectation conditional on the investor™s time t information. Sensibly, the

price at time t should be higher if there is information at time t that the discounted payoff is

likely to be higher than usual at time t + 1. The basic asset pricing equation should be

pt = Et (mt+1 xt+1 ).

(Conditional expectation can also be written

pt = E [mt+1 xt+1 |It ]

when it is important to specify the information set It .).

If payoffs and discount factors were independent and identically distributed (i.i.d.) over

time, then conditional expectations would be the same as unconditional expectations and

we would not have to worry about the distinction between the two concepts. But stock

price/dividend ratios, bond and option prices all change over time, which must re¬‚ect chang-

ing conditional moments of something on the right hand side.

One approach is to specify and estimate explicit statistical models of conditional distribu-

tions of asset payoffs and discount factor variables (e.g. consumption growth). This approach

is sometimes used, and is useful in some applications, but it is usually cumbersome. As we

make the conditional mean, variance, covariance, and other parameters of the distribution of

(say) N returns depend ¬‚exibly on M information variables, the number of required param-

eters can quickly exceed the number of observations.

More importantly, this explicit approach typically requires us to assume that investors use

the same model of conditioning information that we do. We obviously don™t even observe all

the conditioning information used by economic agents, and we can™t include even a fraction

of observed conditioning information in our models. The basic feature and beauty of asset

prices (like all prices) is that they summarize an enormous amount of information that only

individuals see. The events that make the price of IBM stock change by a dollar, like the

events that make the price of tomatoes change by 10 cents, are inherently unobservable to

economists or would-be social planners (Hayek 1945). Whenever possible, our treatment of

conditioning information should allow agents to see more than we do.

If we don™t want to model conditional distributions explicitly, and if we want to avoid as-

suming that investors only see the variables that we include in an empirical investigation, we

eventually have to think about unconditional moments, or at least moments conditioned on

less information than agents see. Unconditional implications are also interesting in and of

themselves. For example, we may be interested in ¬nding out why the unconditional mean

128

SECTION 8.1 SCALED PAYOFFS

returns on some stock portfolios are higher than others, even if every agent fundamentally

seeks high conditional mean returns. Most statistical estimation essentially amounts to char-

acterizing unconditional means, as we will see in the chapter on GMM. Thus, rather than

model conditional distributions, this chapter focuses on what implications for unconditional

moments we can derive from the conditional theory.

8.1 Scaled payoffs

pt = Et (mt+1 xt+1 ) ’ E(pt zt ) = E(mt+1 xt+1 zt )

One can incorporate conditioning information by adding scaled payoffs and doing everything

unconditionally. I interpret scaled returns as payoffs to managed portfolios.

8.1.1 Conditioning down

The unconditional implications of any pricing model are pretty easy to state. From

pt = Et (mt+1 xt+1 )

we can take unconditional expectations to obtain6

(108)

E(pt ) = E(mt+1 xt+1 ).

Thus, if we just interpret p to stand for E(pt ), everything we have done above applies

to unconditional moments. In the same way, we can also condition down from agents™ ¬ne

information sets to coarser sets that we observe,

pt = E(mt+1 xt+1 | „¦) ’ E(pt |I ‚ „¦) = E(mt+1 xt+1 | I ‚ „¦)

’ pt = E(mt+1 xt+1 | It ‚ „¦t ) if pt ∈ It .

In making the above statements I used the law of iterated expectations, which is important

enough to highlight it. This law states that if you take an expected value using less informa-

tion of an expected value that is formed on more information, you get back the expected value

using less information. Your best forecast today of your best forecast tomorrow is the same

We need a small technical assumption that the unconditional moment³ moment conditioned on a coarser

or

6

´ ³´

information set exists. For example, if X and Y are normal (0, 1), then E X |Y = 0 but E X is in¬nite.

Y Y

129

CHAPTER 8 CONDITIONING INFORMATION

as your best forecast today. In various useful guises,

E(Et (x)) = E(x),

Et’1 (Et (xt+1 )) = Et’1 (xt+1 )

E [E(x|„¦) | I ‚ „¦] = E [x|I]

8.1.2 Instruments and managed portfolios

We can do more than just condition down. Suppose we multiply the payoff and price by an

instrument zt observed at time t. Then,

zt pt = Et (mt+1 xt+1 zt )

and, taking unconditional expectations,

(109)

E(pt zt ) = E(mt+1 xt+1 zt ).

This is an additional implication of the conditional model, not captured by just condition-

ing down as in (8.108). This trick originates from the GMM method of estimating asset

pricing models, discussed below. The word instruments for the z variables comes from the

instrumental variables estimation heritage of GMM.

To think about equation (8.109), group (xt+1 zt ). Call this product a payoff x = xt+1 zt ,

with price p = E(pt zt ). Then 8.109 reads

p = E(mx)

once again. Rather than thinking about (8.109) as a instrumental variables estimate of a

conditional model, we can think of it as a price and a payoff, and apply all the asset pricing

theory directly.

This interpretation is not as arti¬cial as it sounds. zt xt+1 are the payoffs to managed

portfolios. An investor who observes zt can, rather than “buy and hold,” invest in an asset

according to the value of zt . For example, if a high value of zt forecasts that asset returns are

likely to be high the next period, the investor might buy more of the asset when zt is high and

vice-versa. If the investor follows a linear rule, he puts zt pt dollars into the asset each period

and receives zt xt+1 dollars the next period.

This all sounds new and different, but practically every test uses managed portfolios.

For example, the size, beta, industry, book/market and so forth portfolios of stocks are all

managed portfolios, since their composition changes every year in response to conditioning

information “ the size, beta, etc. of the individual stocks. This idea is also closely related

to the deep idea of dynamic spanning. Markets that are apparently very incomplete can in

130

SECTION 8.2 SUFFICIENCY OF ADDING SCALED RETURNS

reality provide many more state-contingencies through dynamic (conditioned on information)

trading strategies.

Equation (8.109) offers a very simple view of how to incorporate the extra information

in conditioning information: Add managed portfolio payoffs, and proceed with unconditional

moments as if conditioning information didn™t exist!

Linearity is not important. If the investor wanted to place, say, 2 + 3z 2 dollars in the

asset, we could capture this desire with an instrument z2 = 2 + 3z 2 . Nonlinear (measurable)

transformations of time’t random variables are again random variables.

We can thus incorporate conditioning information while still looking at unconditional

moments instead of conditional moments, without any of the statistical machinery of explicit

models with time-varying moments. The only subtleties are 1) The set of asset payoffs ex-

pands dramatically, since we can consider all managed portfolios as well as basic assets,

potentially multiplying every asset return by every information variable. 2) Expected prices

of managed portfolios show up for p instead of just p = 0 and p = 1 if we started with basic

asset returns and excess returns.

8.2 Suf¬ciency of adding scaled returns

Checking the expected price of all managed portfolios is, in principle, suf¬cient to check

all the implications of conditioning information.

E(zt ) = E(mt+1 Rt+1 zt ) ∀zt ∈ It ’ 1 = E(mt+1 Rt+1 |It )

E(pt ) = E(mt+1 xt+1 ) ∀ xt+1 ∈ X t+1 ’ pt = E(mt+1 xt+1 |It )

We have shown that we can derive some extra implications from the presence of con-

ditioning information by adding scaled returns. But does this exhaust the implications of

conditioning information? Are we missing something important by relying on this trick?

The answer is, in principle no.

I rely on the following mathematical fact: The conditional expectation of a variable yt+1

given an information set It , E(yt+1 | It ) is equal to a regression forecast of yt+1 using every

variable zt ∈ It . Now, “every random variable” means every variable and every nonlinear

(measurable) transformation of every variable, so there are a lot of variables in this regression!

(The word projection and proj(yt+1 |zt ) is used to distinguish the best forecast of yt+1 using

only linear combinations of zt from the conditional expectation.) Applying this fact to our

case, let yt+1 = mt+1 Rt+1 ’ 1. Then E [(mt+1 Rt+1 ’ 1) zt ] = 0 for every zt ∈ It implies

1 = E(mt+1 Rt+1 | It ). Thus, no implications are lost in principle by looking at scaled

returns.

131

CHAPTER 8 CONDITIONING INFORMATION

Another way of looking at the same idea is that Rt+1 zt+1 is the return on a payoff avail-

able at time t + 1. Thus, the space of all payoffs X t+1 should be understood to include the

time-t + 1 payoff you can generate with a basis set of assets Rt+1 and all dynamic strategies

that use information in the set It . With that de¬nition of the space X t+1 we can write the

suf¬ciency of scaled returns with the more general second equality above.

“All linear and nonlinear transformations of all variables observed at time t” sounds like a

lot of instruments, and it is. But there is a practical limit to the number of instruments zt one

needs to scale by, since only variables that forecast returns or m (or their higher moments

and co-moments) add any information.

Since adding instruments is the same thing as including potential managed portfolios,

thoughtfully choosing a few instruments is the same thing as the thoughtful choice of a few

assets or portfolios that one makes in any test of an asset pricing model. Even when evaluating

completely unconditional asset pricing models, one always forms portfolios and omits many

possible assets from analysis. Few studies, in fact, go beyond checking whether a model

correctly prices 10-25 stock portfolios and a few bond portfolios. Implicitly, one feels that

the chosen payoffs do a pretty good job of spanning the set of available risk-loadings (mean

returns) and hence that adding additional assets will not affect the results. Nonetheless, since

data are easily available on all 2000 or so NYSE stocks, plus AMEX and NASDAQ stocks, to

say nothing of government and corporate bonds, returns of mutual funds, foreign exchange,

foreign equities, real investment opportunities, etc., the use of a few portfolios means that a

tremendous number of potential asset payoffs are left out in an ad-hoc manner.

In a similar manner, if one had a small set of instruments that capture all the predictability

of discounted returns mt+1 Rt+1 , then there would be no need to add more instruments.

Thus, we carefully but arbitrarily select a few instruments that we think do a good job of

characterizing the conditional distribution of returns. Exclusion of potential instruments is

exactly the same thing as exclusion of assets. It is no better founded, but the fact that it is a

common sin may lead one to worry less about it.

There is nothing special about unscaled returns, and no economic reason to place them

above scaled returns. A mutual fund might come into being that follows the managed port-

folio strategy and then its unscaled returns would be the same as an original scaled return.

Models that cannot price scaled returns are no more interesting than models that can only

price (say) stocks with ¬rst letter A through L. (There may be econometric reasons to trust

results for nonscaled returns a bit more, but we haven™t gotten to statistical issues yet.)

Of course, the other way to incorporate conditioning information is by constructing ex-

plicit parametric models of conditional distributions. With this procedure one can in fact

check all of a model™s implications about conditional moments. However, the parametric

model may be incorrect, or may not re¬‚ect some variable used by investors. Including in-

struments may not be as ef¬cient, but it is still consistent if the parametric model is incorrect.

The wrong parametric model of conditional distributions may lead to inconsistent estimates.

In addition, one avoids estimating nuisance parameters of the parametric distribution model.

132

SECTION 8.3 CONDITIONAL AND UNCONDITIONAL MODELS

8.3 Conditional and unconditional models

A conditional factor model does not imply a ¬xed-weight or unconditional factor model:

mt+1 = b0 ft+1 , pt = Et (mt+1 xt+1 ) does not imply that ∃b s.t. mt+1 = b0 ft+1 , E(pt ) =

t

E(mt+1 xt+1 ).

Et (Rt+1 ) = β 0 »t does not imply E(Rt+1 ) = β 0 ».

t

Conditional mean-variance ef¬ciency does not imply unconditional mean-variance ef¬-

ciency.

The converse statements are true, if managed portfolios are included.

For explicit discount factor models”models whose parameters are constant over time”

the fact that one looks at a conditional vs. unconditional implications makes no difference to

the statement of the model.

pt = Et (mt+1 xt+1 ) ’ E(pt ) = E(mt+1 xt+1 )

and that™s it. Examples include the consumption-based model with power utility, mt+1 =

β(ct+1 /ct )’γ , and the log utility CAPM, mt+1 = 1/Rt+1 .

W

However, linear factor models include parameters that may vary over time and as func-

tions of conditioning information. In these cases the transition from conditional to uncondi-

tional moments is much more subtle. We cannot easily condition down the model at the same

time as the prices and payoffs.

8.3.1 Conditional vs. unconditional factor models in discount factor language

As an example, consider the CAPM

m = a ’ bRW

where RW is the return on the market or wealth portfolio. We can ¬nd a and b from the

condition that this model correctly price any two returns, for example RW itself and a risk-

free rate:

±

½ 1 W

a = Rf + bEt (Rt+1 )

W

1 = Et (mt+1 Rt+1 )

(110)

t

’ .

Et (RW )’Rf

f

b = f t+1 W t

1 = Et (mt+1 )Rt 2 (R

Rσ )

t t t+1

As you can see, b > 0 and a > 0: to make a payoff proportional to the minimum second-

moment return (on the inef¬cient part of the mean-variance frontier) we need a portfolio long

the risk free rate and short the market RW .

133

CHAPTER 8 CONDITIONING INFORMATION

More importantly for our current purposes, a and b vary over time, as Et (RW ), σ 2 (Rt+1 ),

W

t+1 t

f

and Rt vary over time. If it is to price assets conditionally, the CAPM must be a linear factor

model with time-varying weights, of the form

mt+1 = at ’ bt RW .

t+1

This fact means that we can no longer transparently condition down. The statement that

£ ¤

1 = Et (at + bt RW )Rt+1

t+1

does not imply that we can ¬nd constants a and b so that

£ ¤

1 = E (a + bRW )Rt+1 .

t+1

Just try it. Taking unconditional expectations,

£ ¤ £ ¤

W W

1 = E (at + bt Rt+1 )Rt+1 = E at Rt+1 + bt Rt+1 Rt+1

= E(at )E(Rt+1 ) + E(bt )E(RW Rt+1 ) + cov(at , Rt+1 ) + cov(bt , Rt+1 Rt+1 )

W

t+1

Thus, the unconditional model

£¡ ¢ ¤

W

1=E E(at ) + E(bt )Rt+1 Rt+1

only holds if the covariance terms above happen to be zero. Since at and bt are formed from

conditional moments of returns, the covariances will not, in general be zero.

On the other hand, suppose it is true that at and bt are constant over time. Then

£ ¤

1 = Et (a + bRW )Rt+1

t+1

does imply

£ ¤

W

1 = E (a + bRt+1 )Rt+1 ,

just like any other constant-parameter factor pricing model. Furthermore, the latter uncondi-

tional model implies the former conditional model, if the latter holds for all managed portfo-

lios.

8.3.2 Conditional vs. unconditional in an expected return / beta model

To put the same observation in beta-pricing language,

f

Et (Ri ) = Rt + β t »t (111)

134

SECTION 8.3 CONDITIONAL AND UNCONDITIONAL MODELS

does not imply that

E(Ri ) = ± + β» (112)

The reason is that β t and β represent conditional and unconditional regression coef¬cients

respectively.

Again, if returns and factors are i.i.d., the unconditional model can go through. In that

case, cov(·) = covt (·), var(·) = vart (·), so the unconditional regression beta is the same

as the conditional regression beta, β = β t . Then, we can take expectations of (8.111) to get

(8.112), with » = E(»t ). But to condition down in this way, the covariance and variance must

each be constant over time. It is not enough that their ratio, or conditional betas are constant.

If covt and vart change over time, then the unconditional regression beta, β = cov/var is

not equal to the average conditional regression beta, E(β t ) or E(covt /vart ). Some models

specify that covt and vart vary over time, but covt /vart is a constant. This speci¬cation still

does not imply that the unconditional regression beta β ≡ cov/var is equal to the constant

covt /vart . Similarly, it is not enough that » be constant, since E(β t ) 6= β. The betas must

be regression coef¬cients, not just numbers.

If the betas do not vary over time, the »t may still vary and » = E(»t ).

8.3.3 A precise statement

Let™s formalize these observations somewhat. Let X denote the space of all portfolios of the

primitive assets, including managed portfolios in which the weights may depend on condi-

tioning information, i.e. scaled returns.

A conditional factor pricing model is a model mt+1 = at + b0 ft+1 that satis¬es pt =

t

Et+1 (mt+1 xt+1 ) for all xt+1 ∈ X.

An unconditional factor pricing model is model mt+1 = a + b0 ft+1 satis¬es E(pt ) =

E(mt+1 xt+1 ) for all xt+1 ∈ X. It might be more appropriately called a ¬xed-weight factor

pricing model.

Given these de¬nitions it™s almost trivial that the unconditional model is just a special

case of the conditional model, one that happens to have ¬xed weights. Thus, a conditional

factor model does not imply an unconditional factor model (because the weights may vary)

but an unconditional factor model does imply a conditional factor model.

There is one important subtlety. The payoff space X is common, and contains all managed

portfolios in both cases. The payoff space for the unconditional factor pricing model is not

just ¬xed combinations of a set of basis assets. For example, we might simply check that

the static (constant a, b) CAPM captures the unconditional mean returns of a set of assets. If

this model does not also price those assets scaled by instruments, then it is not a conditional

model, or, as I argued above, really a valid factor pricing model at all.

Of course, everything applies for the relation between a conditional factor pricing model

135

CHAPTER 8 CONDITIONING INFORMATION

using a ¬ne information set (like investors™ information sets) and conditional factor pricing

models using coarser information sets (like ours). If you think a set of factors prices assets

with respect to investors™ information, that does not mean the same set of factors prices assets

with respect to our, coarser, information sets.

8.3.4 Mean-variance frontiers

De¬ne the conditional mean-variance frontier as the set of returns that minimize vart (Rt+1 )

given Et (Rt+1 ). (This de¬nition includes the lower segment as usual.) De¬ne the uncondi-

tional mean-variance frontier as the set of returns including managed portfolio returns that

minimize var(Rt+1 ) given E(Rt+1 ). These two frontiers are related by:

If a return is on the unconditional mean-variance frontier, it is on the conditional

mean-variance frontier.

However,

If a return is on the conditional mean-variance frontier, it need not be on the uncon-

ditional mean-variance frontier.

These statements are exactly the opposite of what you ¬rst expect from the language. The

law of iterated expectations E(Et (x)) = E(x) leads you to expect that “conditional” should

imply “unconditional.” But we are studying the conditional vs. unconditional mean-variance

frontier, not raw conditional and unconditional expectations, and it turns out that exactly the

opposite words apply. Of course “unconditional” can also mean “conditional on a coarser

information set.”

Again, keep in mind that the unconditional mean variance frontier includes returns on

managed portfolios. This de¬nition is eminently reasonable. If you™re trying to minimize

variance for given mean, why tie your hands to ¬xed weight portfolios? Equivalently, why

not allow yourself to include in your portfolio the returns of mutual funds whose advisers

promise the ability to adjust portfolios based on conditioning information?

You could form a mean-variance frontier of ¬xed-weight portfolios of a basis set of assets,

and this is what many people often mean by “unconditional mean-variance frontier.” The re-

turn on the true unconditional mean-variance frontier will, in general, include some managed

portfolio returns, and so will lie outside this mean-variance frontier of ¬xed-weight portfolios.

Conversely, a return on the ¬xed-weight portfolio MVF is, in general, not on the uncondi-

tional or conditional mean-variance frontier. All we know is that the ¬xed-weight frontier lies

inside the other two. It may touch, but it need not. This is not to say the ¬xed-weight uncon-

ditional frontier is uninteresting. For example, returns on this frontier will price ¬xed-weight

portfolios of the basis assets. The point is that this frontier has no connection to the other two

frontiers. In particular, a conditionally mean-variance ef¬cient return (conditional CAPM)

need not unconditionally price the ¬xed weight portfolios.

136

SECTION 8.3 CONDITIONAL AND UNCONDITIONAL MODELS

I offer several ways to see this important statement.

Using the connection to factor models

We have seen that the conditional CAPM mt+1 = at ’ bt RW does not imply an uncon-

t+1

ditional CAPM mt+1 = a ’ bRt+1 . We have seen that the existence of such a conditional

W

factor model is equivalent to the statement that the return Rt+1 lies on the conditional mean-

W

variance frontier, and the existence of an unconditional factor model mt+1 = a ’ bRW is t+1

equivalent to the statement that RW is on the unconditional mean-variance frontier. Then,

from the “trivial” fact that an unconditional factor model is a special case of a conditional

one, we know that RW on the unconditional frontier implies RW on the conditional frontier

but not vice-versa.

Using the orthogonal decomposition

We can see the relation between conditional and unconditional mean-variance frontiers

using the orthogonal decomposition characterization of mean-variance ef¬ciency given above.

This beautiful proof is the main point of Hansen and Richard (1987).

By the law of iterated expectations, x— and R— generate expected prices and Re— generates

unconditional means as well as conditional means:

E [p = Et (x— x)] ’ E(p) = E(x— x)

£ ¤

E Et (R—2 ) = Et (R— R) ’ E(R—2 ) = E(R— R)

E [Et (Re— Re ) = Et (Re )] ’ E(Re— Re ) = E(Re )

This fact is subtle and important. For example, starting with x— = p0 Et (xt+1 x0 )’1 xt+1 ,

t t+1

you might think we need a different x— , R— , Re— to represent expected prices and uncon-

ditional means, using unconditional probabilities to de¬ne inner products. The three lines

above show that this is not the case. The same old x— , R— , Re— represent conditional as well

as unconditional prices and means.

Recall that a return is mean-variance ef¬cient if and only if it is of the form

Rmv = R— + wRe— .

Thus, Rmv is conditionally mean-variance ef¬cient if w is any number in the time t informa-

tion set.

conditional frontier: Rt+1 = Rt+1 + wt Re— ,

mv —

t+1

and Rmv is unconditionally mean-variance ef¬cient if w is any constant.

unconditional frontier: Rt+1 = Rt+1 + wRe— .

mv —

t+1

137

CHAPTER 8 CONDITIONING INFORMATION

Constants are in the t information set; time t random variables are not necessarily constant.

Thus unconditional ef¬ciency (including managed portfolios) implies conditional ef¬ciency

but not vice versa. As with the factor models, once you see the decomposition, it is a trivial

argument about whether a weight is constant or time-varying.

Brute force and examples.

If you™re still puzzled, an additional argument by brute force may be helpful.

If a return is on the unconditional MVF it must be on the conditional MVF at each date.

If not, you could improve the unconditional mean-variance trade-off by moving to the con-

ditional MVF at each date. Minimizing unconditional variance given mean is the same as

minimizing unconditional second moment given mean,

min E(R2 ) s.t. E(R) = µ

Writing the unconditional moment in terms of conditional moments, the problem is

£ ¤

min E Et (R2 ) s.t. E [Et (R)] = µ

Now, suppose you could lower Et (R2 ) at one date t without affecting Et (R) at that date.

This change would lower the objective, without changing the constraint. Thus, you should

have done it: you should have picked returns on the conditional mean variance frontiers.

It almost seems that reversing the argument we can show that conditional ef¬ciency im-

plies unconditional ef¬ciency, but it doesn™t. Just because you have minimized Et (R2 ) for

given value of Et (R) at each date t does not imply that you have minimized E(R2 ) for a

given value of E(R). In showing that unconditional ef¬ciency implies conditional ef¬ciency

we held ¬xed Et (R) at each date at µ, and showed it is a good idea to minimize σ t (R). In

trying to go backwards, the problem is that a given value of E(R) does not specify what

Et (R) should be at each date. We can increase Et (R) in one conditioning information set

and decrease it in another, leaving the return on the conditional MVF.

Figure 22 presents an example. Return B is conditionally mean-variance ef¬cient. It also

has zero unconditional variance, so it is the unconditionally mean-variance ef¬cient return at

the expected return shown. Return A is on the conditional mean-variance frontiers, and has

the same unconditional expected return as B. But return A has some unconditional variance,

and so is inside the unconditional mean-variance frontier.

As a second example,the riskfree rate is only on the unconditional mean-variance frontier

if it is a constant. Remember the expression (6.95) for the risk free rate,

Rf = R— + Rf Re— .

The unconditional mean-variance frontier is R— + wRe— with w a constant. Thus, the riskfree

rate is only unconditionally mean-variance ef¬cient if it is a constant.

138

SECTION 8.4 SCALED FACTORS: A PARTIAL SOLUTION

Et(R) Info. set 1

A Info. set 2

B

A

σt(R)

Figure 22. Return A is on the conditional mean-variance frontiers but not on the uncondi-

tional mean variance frontier.

8.3.5 Implications: Hansen-Richard Critique.

Many models, such as the CAPM, imply a conditional linear factor model mt+1 = at +

b0 ft+1 . These theorems show that such a model does not imply an unconditional model.

t

Equivalently, if the model predicts that the market portfolio is conditionally mean-variance

ef¬cient, this does not imply that the market is unconditionally mean-variance ef¬cient. We

often test the CAPM by seeing if it explains the average returns of some portfolios or (equiv-

alently) if the market is on the unconditional mean-variance frontier. The CAPM may quite

well be true (conditionally) and fail these tests; many assets may do better in terms of uncon-

ditional mean vs. unconditional variance.

The situation is even worse than these comments seem, and is not repaired by simple

inclusion of some conditioning information. Models such as the CAPM imply a conditional

linear factor model with respect to investors™ information sets. However, the best we can hope

to do is to test implications conditioned down on variables that we can observe and include

in a test. Thus, a conditional linear factor model is not testable!

I like to call this observation the “Hansen-Richard critique” by analogy to the “Roll Cri-

tique.” Roll pointed out, among other things, that the wealth portfolio might not be observ-

able, making tests of the CAPM impossible. Hansen and Richard point out that the condi-

tioning information of agents might not be observable, and that one cannot omit it in testing a

conditional model. Thus, even if the wealth portfolio were observable, the fact that we cannot

observe agents™ information sets dooms tests of the CAPM.

139

CHAPTER 8 CONDITIONING INFORMATION

8.4 Scaled factors: a partial solution

You can expand the set of factors to test conditional factor pricing models

factors = ft+1 — zt

The problem is that the parameters of the factor pricing model mt+1 = at + bt ft+1 may

vary over time. A partial solution is to model the dependence of parameters at and bt on

variables in the time’t information set; let at = a(zt ), bt = b(zt ) where zt is a vector of

variables observed at time t (including a constant). In particular, why not try linear models

at = a0 z t , bt = b0 z t

Linearity is not restrictive: zt is just another instrument. The only criticism one can make

2

is that some instrument zjt is important for capturing the variation in at and bt , and was

omitted. For instruments on which we have data, we can meet this objection by trying zjt

and seeing whether it does, in fact, enter signi¬cantly. However, for instruments zt that are

observed by agents but not by us, this criticism remains valid.

Linear discount factor models lead to a nice interpretation as scaled factors, in the same

way that linearly managed portfolios are scaled returns. With a single factor and instrument,

write

(113)

mt = a(zt ) + b(zt )ft+1

= a0 + a1 zt + (b0 + b1 zt )ft+1

(114)

= a0 + a1 zt + b0 ft+1 + b1 (zt ft+1 ) .

Thus, in place of the one-factor model with time-varying coef¬cients (8.113), we have a

three-factor model (zt , ft+1 , zt ft+1 ) with ¬xed coef¬cients, (8.114).

Since the coef¬cients are now ¬xed, we can use the scaled-factor model with uncondi-

tional moments.

pt = Et [(a0 + a1 zt + b0 ft+1 + b1 (zt ft+1 )) xt+1 ] ’

E(pt ) = E [(a0 + a1 zt + b0 ft+1 + b1 (zt ft+1 )) xt+1 ]

For example, in standard derivations of CAPM, the market (wealth portfolio) return is

conditionally mean-variance ef¬cient; investors want to hold portfolios on the conditional

140

SECTION 8.5 SUMMARY

mean-variance frontier; conditionally expected returns follow a conditional single-beta rep-

resentation, or the discount factor m follows a conditional linear factor model

W

mt+1 = at ’ bt Rt+1

as we saw above.

But none of these statements mean that we can use the CAPM unconditionally. Rather

than throw up our hands, we can add some scaled factors. Thus, if, say, the dividend/price ra-

tio and term premium do a pretty good job of summarizing variation in conditional moments,

the conditional CAPM implies an unconditional, ¬ve-factor (plus constant) model. The fac-

tors are a constant, the market return, the dividend/price ratio, the term premium, and the

market return times the dividend-price ratio and the term premium.

The unconditional pricing implications of such a ¬ve-factor model could, of course, be

summarized by a single’β representation. (See the caustic comments in the section on im-

plications and equivalence.) The reference portfolio would not be the market portfolio, of

course, but a mimicking portfolio of the ¬ve factors. However, the single mimicking port-

folio would not be easily interpretable in terms of a single factor conditional model and two

instruments. In this case, it might be more interesting to look at a multiple ’β or multiple-

factor representation.

If we have many factors f and many instruments z, we should in principle multiply every

factor by every instrument,

m = b1 f1 + b2 f1 z1 + b3 f1 z2 + ... + bN+1 f2 + bN+2 f2 z1 + bN+3 f2 z2 + ...

This operation can be compactly summarized with the Kronecker product notation, a — b,

which means “multiply every element in vector a by every element in vector b, or

mt+1 = b0 (ft+1 — zt ).

8.5 Summary

When you ¬rst think about it, conditioning information sounds scary “ how do we account for

time-varying expected returns, betas, factor risk premia, variances, covariances, etc. How-

ever, the methods outlined in this chapter allow a very simple and beautiful solution to the

problems raised by conditioning information. To express the conditional implications of a

given model, all you have to do is include some scaled or managed portfolio returns, and then

pretend you never heard about conditioning information.

Some factor models are conditional models, and have coef¬cients that are functions of

investors™ information sets. In general, there is no way to test such models, but if you are

willing to assume that the relevant conditioning information is well summarized by a few

variables, then you can just add new factors, equal to the old factors scaled by the conditioning

141

CHAPTER 8 CONDITIONING INFORMATION

variables, and again forget that you ever heard about conditioning information.

You may want to remember conditioning information as a diagnostic and in economic

interpretation of the results. It may be interesting to take estimates of a many factor model,

mt = a0 + a1 zt + b0 ft+1 + b1 zt ft+1 , and see what they say about the implied conditional

model, mt = (a0 + a1 zt ) + (b0 + b1 zt )ft+1 . You may want to make plots of conditional

bs, betas, factor risk premia, expected returns,etc. But you don™t have to worry about it in

estimation and testing.

8.6 Problems

1. If there is a risk free asset, is it on the a) conditional b) unconditional c) both

mean-variance frontier?

2. If there is a conditionally riskfree asset “ a claim to 1 is traded at each date, does this

mean that there is an unconditionally risk free asset? (De¬ne the latter ¬rst!) How about

vice versa?

3. Suppose you took the unconditional population moments E(R), E(RR0 ) of assets

returns and constructed the mean-variance frontier. Does this frontier correspond to the

conditional or the unconditional MV frontier, or neither? What is the key assumption

underlying your answer?

142

Chapter 9. Factor pricing models

In Chapter 2, I noted that the consumption-based model, while a complete answer to most

asset pricing questions in principle, does not (yet) work well in practice. This observation

motivates efforts to tie the discount factor m to other data. Linear factor pricing models are

the most popular models of this sort in ¬nance. They dominate discrete time empirical work.

Factor pricing models replace the consumption-based expression for marginal utility

growth with a linear model of the form

mt+1 = a + b0 f t+1

a and b are free parameters. This speci¬cation is equivalent to a multiple-beta model

E(Rt+1 ) = ± + β 0 »

where β are multiple regression coef¬cients of returns R on the factors f . Here, ± and » are

the free parameters.

The big question is, what should one use for factors ft+1 ? Factor pricing models look for

variables that are good proxies for aggregate marginal utility growth, i.e., variables for which

u0 (ct+1 )

≈ a + b0 f t+1 (115)

β0

u (ct )

is a sensible and economically interpretable approximation.

More directly and interpretably, the essence of asset pricing is that there are special states

of the world in which investors are especially concerned that their portfolios not do badly.

They are willing to trade off some overall performance “ average return “ to make sure that

portfolios do not do badly in these particular states of nature. The factors are variables that

indicate that these “bad states” have occurred.

The factors that result from this search are and should be intuitively sensible. In any

sensible economic model, as well as in the data, consumption is related to returns on broad-

based portfolios, to interest rates, to growth in GNP, investment, or other macroeconomic

variables, and to returns on production processes. All of these variables measure “wealth”

or the state of the economy. Consumption is and should be high in “good times” and low in

“bad times.”

Furthermore, consumption and marginal utility respond to news: if a change in some

variable today signals high income in the future, then consumption rises now, by permanent

income logic. This fact opens the door to forecasting variables: any variable that forecasts

asset returns (“changes in the investment opportunity set”) or macroeconomic variables is a

candidate factor. Variables such as the term premium, dividend/price ratio, stock returns, etc.

can be defended as pricing factors on this logic. Though they themselves are not measures of

aggregate good or bad times, they forecast such times.

143

CHAPTER 9 FACTOR PRICING MODELS

Should factors be independent over time? The answer is, sort of. If there is a constant

real interest rate, then marginal utility growth should be unpredictable. (“Consumption is a

random walk” in the quadratic utility permanent income model.) To see this, just look at the

¬rst order condition with a constant interest rate,

u0 (ct ) = βRf Et [u0 (ct+1 )]

or in a more time-series notation,

u0 (ct+1 ) 1

= + µt+1 ; Et (µt+1 ) = 0.

u0 (ct ) βRf

The real risk free rate is not constant, but it does not vary a lot, especially compared to as-

set returns. Measured consumption growth is not exactly unpredictable but it is the least

predictable macroeconomic time series, especially if one accounts properly for temporal ag-

gregation (consumption data are quarterly averages). Thus, factors that proxy for marginal

utility growth, though they don™t have to be totally unpredictable, should not be highly pre-

dictable. If one chooses highly predictable factors, the model will counterfactually predict

large interest rate variation.

In practice, this consideration means that one should choose the right units: Use GNP

growth rather than level, portfolio returns rather than prices or price/dividend ratios, etc.

However, unless one wants to impose an exactly constant risk free rate, one does not have to

¬lter or prewhiten factors to make them exactly unpredictable.

This view of factors as intuitively motivated proxies for marginal utility growth is suf¬-

cient to carry the reader through current empirical tests of factor models. The extra constraints

of a formal exposition of theory in this part have not yet constrained the factor-¬shing expe-

dition.

The precise derivations all proceed in the way I have motivated factor models: One writes

down a general equilibrium model, in particular a speci¬cation of the production technology

by which real investment today results in real output tomorrow. This general equilibrium

produces relations that express the determinants of consumption from exogenous variables,

and relations linking consumption and other endogenous variables; equations of the form

ct = g(ft ). One then uses this kind of equation to substitute out for consumption in the basic

¬rst order conditions.

The formal derivations accomplish two things: they determine one particular list of factors

that can proxy for marginal utility growth, and they prove that the relation should be linear.

Some assumptions can often be substituted for others in the quest for these two features of a

factor pricing model.

This is a point worth remembering: all factor models are derived as specializations of the

consumption-based model. Many authors of factor model papers disparage the consumption-

based model, forgetting that their factor model is the consumption-based model plus extra

assumptions that allow one to proxy for marginal utility growth from some other variables.

144

SECTION 9.1 CAPITAL ASSET PRICING MODEL (CAPM)

My presentation follows Constantinides™ (1989) derivation of traditional models as instances

of the consumption-based model in this regard.

Above, I argued that clear economic foundation was important for factor models, since it

is the only guard against ¬shing. Alas, we discover here that the current state of factor pricing

models is not a particularly good guard against ¬shing. One can call for better theories or

derivations, more carefully aimed at limiting the list of potential factors and describing the

fundamental macroeconomic sources of risk, and thus providing more discipline for empirical

work. The best minds in ¬nance have been working on this problem for 40 years though, so

a ready solution is not immediately in sight. On the other hand, we will see that even current

theory can provide much more discipline than is commonly imposed in empirical work. For

example, the derivations of the CAPM and ICAPM do leave predictions for the risk free rate

and for factor risk premia that are often ignored. The ICAPM gives tighter restrictions on

state variables than are commonly checked: “State variables” do have to forecast something!

We also see how special and unrealistic are the general equilibrium setups necessary to derive

popular speci¬cations such as CAPM and ICAPM. This observation motivates a more serious

look at real general equilibrium models below.

9.1 Capital Asset Pricing Model (CAPM)

The CAPM is the model m = a + bRw ; Rw = wealth portfolio return. I derive it from

the consumption based model by 1) Two period quadratic utility; 2) Two periods, exponential

utility and normal returns; 3) In¬nite horizon, quadratic utility and i.i.d. returns; 4) Log utility

and normally distributed returns.

The CAPM is the ¬rst, most famous and (so far) most widely used model in asset pricing.

It ties the discount factor m to the return on the “wealth portfolio.” The function is linear,

W

mt+1 = a + bRt+1 .

a and b are free parameters. One can ¬nd theoretical values for the parameters a and b by

requiring the discount factor m to price any two assets, such as the wealth portfolio return

and risk-free rate, 1 = E(mRW ) and 1 = E(m)Rf . (As an example, we did this in equation

(8.110) above.) In empirical applications, we can also pick a and b to “best” price larger

cross-sections of assets. We do not have good data on, or even a good empirical de¬nition

for, the return on total wealth. It is conventional to proxy RW by the return on a broad-based

stock portfolio such as the value- or equally-weighted NYSE, S&P500, etc.

The CAPM is of course most frequently stated in equivalent expected return / beta lan-

guage,

E(Ri ) = ± + β i,RW [E(Rw ) ’ ±] .

145

CHAPTER 9 FACTOR PRICING MODELS

This section brie¬‚y describes some classic derivations of the CAPM. Again, we need

to ¬nd assumptions that defend which factors proxy for marginal utility (RW here), and

assumptions to defend the linearity between m and the factor.

I present several derivations of the same model. Many of these derivations use classic

modeling assumptions which are important in their own sake. This is also an interesting place

in which to see that various sets of assumptions can often be used to get to the same place.

The CAPM is often criticized for one or another assumption. By seeing several derivations,

we can see how one assumption can be traded for another. For example, the CAPM does not

in fact require normal distributions, if one is willing to swallow quadratic utility instead.

9.1.1 Two-period quadratic utility

Two period investors with no labor income and quadratic utility imply the CAPM.

Investors have quadratic preferences and only live two periods,

1 1

U(ct , ct+1 ) = ’ (ct ’ c— )2 ’ βE[(ct+1 ’ c— )2 ]. (116)

2 2

Their marginal rate of substitution is thus

u0 (ct+1 ) (ct+1 ’ c— )

mt+1 = β =β .

u0 (ct ) (ct ’ c— )

The quadratic utility assumption means marginal utility is linear in consumption. Thus, the

¬rst target of the derivation, linearity.

Investors are born with wealth Wt in the ¬rst period and earn no labor income. They

can invest in lots of assets with prices pi and payoffs xi , or, to keep the notation simple,

t t+1

returns Rt+1 . They choose how much to consume at the two dates, ct and ct+1 , and the

i

portfolio weights ±i for their investment portfolio. Thus, the budget constraint is

(117)

ct+1 = Wt+1

Wt+1 = RW (Wt ’ ct )

t+1

N N

X X

W i

R = ±i R ; ±i = 1.

i=1 i=1

RW is the rate of return on total wealth.

146

SECTION 9.1 CAPITAL ASSET PRICING MODEL (CAPM)

The two-period assumption means that investors consume everything in the second pe-

riod, by constraint (9.117). This fact allows us to substitute wealth and the return on wealth

for consumption, achieving the second goal of the derivation, naming the factor that proxies

for consumption or marginal utility:

Rt+1 (Wt ’ ct ) ’ c—

W

’βc— β(Wt ’ ct ) W

mt+1 =β = + Rt+1

ct ’ c— ct ’ c— ct ’ c—

i.e.

mt+1 = at + bt RW .

t+1

9.1.2 Exponential utility, normal distributions

u(c) = ’e’±c and a normally distributed set of returns also produces the CAPM.

The combination of exponential utility and normal distributions is another set of assump-

tions that deliver the CAPM in a one or two period model. This structure has a particularly

convenient analytical form. Since it gives rise to linear demand curves, it is very widely

used in models that complicate the trading structure, by introducing incomplete markets or

asymmetric information.

I present a model with consumption only in the last period. (You can do the quadratic

utility model of the last section this way as well.) Utility is

£ ¤

E [u(c)] = E ’e’±c .

± is known as the coef¬cient of absolute risk aversion. If consumption is normally distributed,

we have

±2

σ 2 (c)

Eu(c) = ’e’±E(c)+ .

2

Suppose this investor has initial wealth W which can be split between a riskfree asset

paying Rf and a set of risky assets paying return R. Let y denote the amount of this wealth

W (amount, not fraction) invested in each security. Then, the budget constraint is

c = y f Rf + y 0 R

W = y f + y0 1

Plugging the ¬rst constraint into the utility function we obtain

2

Rf +y 0 E(R)]+ ± y 0 Σy

f

Eu(c) = ’e’±[y (118)

.

2

147

CHAPTER 9 FACTOR PRICING MODELS

As with quadratic utility, the two-period model is what allows us to set consumption to wealth

and then substitute the return on the wealth portfolio for consumption growth in the discount

factor.

Maximizing (9.118) with respect to y, y f , we obtain the ¬rst order condition describing

the optimal amount to be invested in the risky asset,

E(R) ’ Rf

y = Σ’1

±

Sensibly, the investor invests more in risky assets if their expected return is higher, less if his

risk aversion coef¬cient is higher, and less if the assets are riskier. Notice that total wealth

does not appear in this expression. With this setup, the amount invested in risky assets is

independent of the level of wealth. This is why we say that this investor has an aversion to

absolute rather than relative (to wealth) risk aversion. Note also that these “demands” for the

risky assets are linear in expected returns, which is a very convenient property.

Inverting the ¬rst order conditions, we obtain

E(R) ’ Rf = ±Σy = ± cov(R, Rm ). (119)