with y0 R, and also with the investor™s overall portfolio y f Rf + y0 R. If all investors are

identical, then the market portfolio is the same as the individual™s portfolio so Σy also gives

the correlation of each return with Rm = yf Rf + y 0 R. (If investors differ in risk aversion ±,

the same thing goes through but with an aggregate risk aversion coef¬cient.)

Thus, we have the CAPM. This version is especially interesting because it ties the market

price of risk to the risk aversion coef¬cient. Applying (9.119) to the market return itself, we

have

E(Rm ) ’ Rf

= ±.

σ2 (Rm )

9.1.3 Quadratic value function, dynamic programming.

We can let investors live forever in the quadratic utility CAPM so long as we assume that

the environment is independent over time. Then the value function is quadratic, taking the

place of the quadratic second-period utility function. This case is a nice ¬rst introduction to

dynamic programming.

The two-period structure given above is unpalatable, since (most) investors do in fact live

longer than two periods. It is natural to try to make the same basic ideas work with less

148

SECTION 9.1 CAPITAL ASSET PRICING MODEL (CAPM)

restrictive and more palatable assumptions.

We can derive the CAPM in a multi-period context by replacing the second-period quadratic

utility function with a quadratic value function. However, the quadratic value function re-

quires the additional assumption that returns are i.i.d. (no “shifts in the investment oppor-

tunity set”). This observation, due to Fama (1970), is also a nice introduction to dynamic

programming, which is a powerful way to handle multiperiod problems by expressing them

as two period problems. Finally, I think this derivation makes the CAPM more realistic, trans-

parent and intuitively compelling. Buying stocks amounts to taking bets over wealth; really

the fundamental assumption driving the CAPM is that marginal utility of wealth is linear in

wealth and does not depend on other state variables.

Let™s start in a simple ad-hoc manner by just writing down a “utility function” de¬ned

over this period™s consumption and next period™s wealth,

U = u(ct ) + βEt V (Wt+1 ).

This is a reasonable objective for an investor, and does not require us to make the very ar-

ti¬cial assumption that he will die tomorrow. If an investor with this “utility function” can

buy an asset at price pt with payoff xt+1 , his ¬rst order condition (buy a little more, then

x contributes to wealth next period) is

pt u0 (ct ) = βEt [V 0 (Wt+1 )xt+1 ] .

Thus, the discount factor uses next period™s marginal value of wealth in place of the more

familiar marginal utility of consumption

V 0 (Wt+1 )

mt+1 = β

u0 (ct )

(The envelope condition states that, at the optimum, a penny saved has the same value as a

penny consumed u0 (ct ) = V 0 (Wt ). We could use this condition to express the denominator

in terms of wealth also.)

Now, suppose the value function were quadratic,

·

V (Wt+1 ) = ’ (Wt+1 ’ W — )2 .

2

Then, we would have

Rt+1 (Wt ’ ct ) ’ W —

W

Wt+1 ’ W —

mt+1 = ’β· = ’β·

u0 (ct ) u0 (ct )

· ¸· ¸

β·W — β·(Wt ’ ct ) W

= +’ Rt+1 ,

0 (c ) 0 (c )

ut ut

149

CHAPTER 9 FACTOR PRICING MODELS

or, once again,

mt+1 = at + bt RW ,

t+1

the CAPM!

Let™s be clear about the assumptions and what they do.

1) The value function only depends on wealth. If other variables entered the value func-

tion, then ‚V /‚W would depend on those other variables, and so would m. This assumption

bought us the ¬rst objective of any derivation: the identity of the factors. The ICAPM, be-

low, allows other variables in the value function, and obtains more factors. (Actually, other

variables could enter so long as they don™t affect the marginal value of wealth. The weather

is an example: You like me might be happier on sunny days, but you do not value additional

wealth more on sunny than on rainy days. Hence, covariance with weather does not affect

how you value stocks.)

2) The value function is quadratic. We wanted the marginal value function V 0 (W ) be

linear, to buy us the second objective, showing m is linear in the factor. Quadratic utility and

value functions deliver a globally linear marginal value function V 0 (W ). By the usual Taylor

series logic, linearity of V 0 (W ) is probably not a bad assumption for small perturbations, and

not a good one for large perturbations.

Why is the value function quadratic?

You might think we are done. But economists are unhappy about a utility function that

has wealth in it. Few of us are like Disney™s Uncle Scrooge, who got pure enjoyment out

of a daily swim in the coins in his vault. Wealth is valuable because it gives us access to

more consumption. Utility functions should always be written over consumption. One of the

few real rules in economics that keep our theories from being vacuous is that ad-hoc “utility

functions” over other objects like wealth (or means and variances of portfolio returns, or

“status” or “political power”) should be defended as arising from a more fundamental desire

for consumption.

More practically, being careful about the derivation makes clear that the super¬cially

plausible assumption that the value function is only a function of wealth derives from the

much less plausible, in fact certainly false, assumption that interest rates are constant, the

distribution of returns is i.i.d., and that the investor has no risky labor income. So, let us see

what it takes to defend the quadratic value function in terms of some utility function.

Suppose investors last forever, and have the standard sort of utility function

∞

1 Xj

U = ’ Et β u(ct+j ).

2 j=0

Again, investors start with wealth W0 which earns a random return RW and they have no

other source of income. In addition, suppose that interest rates are constant, and stock returns

150

SECTION 9.1 CAPITAL ASSET PRICING MODEL (CAPM)

are i.i.d. over time.

De¬ne the value function as the maximized value of the utility function in this environ-

ment. Thus, de¬ne V (W ) as7

∞

X

β j u(ct+j ) (9.120)

V (Wt ) ≡ max{ct ,ct+1 ,ct+2 ...±t ,±t+1 ,...} Et

j=0

= RW (Wt ’ ct ); RW = ±0 Rt ; ±0 1 = 1

s.t. Wt+1 t+1 t t t

£ ¤0

(I used vector notation to simplify the statement of the portfolio problem; R ≡ R1 R2 ... RN ,

etc.) The value function is the total level of utility the investor can achieve, given how much

wealth he has, and any other variables constraining him. This is where the assumptions of

no labor income, a constant interest rate and i.i.d. returns come in. Without these assump-

tions, the value function as de¬ned above might depend on these other characteristics of the

investor™s environment. For example, if there were some variable, say, “D/P” that indicated

returns would be high or low for a while, then the investor would be happier, and have a

high value, when D/P is high, for a given level of wealth. Thus, we would have to write

V (Wt , D/Pt )

Value functions allow you to express an in¬nite period problem as a two period problem.

Break up the maximization into the ¬rst period and all the remaining periods, as follows

±

®

∞

Xj

V (Wt ) = max{ct ,±t } u(ct ) + βEt ° β u(ct+1+j )»

max Et+1 s. t. ..

{ct+1 ,ct+2 ..,±t+1 ,±t+2 ....}

j=0

or

(121)

V (Wt ) = max{ct ,±t } {u(ct ) + βEt V (Wt+1 )} s.t. ...

Thus, we have defended the existence of a value function. Writing down a two period

“utility function” over this period™s consumption and next period™s wealth is not as crazy as

it might seem.

The value function is also an attractive view of how people actually make decisions. You

don™t think “If I buy a sandwich today, I won™t be able to go out to dinner one night 20 years

from now” “ trading off goods directly as expressed by the utility function. You think “I can™t

afford a new car” meaning that the decline in the value of wealth is not worth the increase

in the marginal utility of consumption. Thus, the maximization in (9.121) describes your

psychological approach to utility maximization.

There is also a transversality condition or a lower limit on wealth in the budget constraints. This keeps the

7

consumer from consuming a bit more and rolling over more and more debt, and it means we can write the budget

constraint in present value form.

151

CHAPTER 9 FACTOR PRICING MODELS

The remaining question is, can the value function be quadratic? What utility function

assumption leads to a quadratic value function? Here is the fun fact: A quadratic utility

function leads to a quadratic value function in this environment. This is not a law of nature;

it is not true that for any u(c), V (W ) has the same functional form. But it is true here and

a few other special cases. The “in this environment” clause is not innocuous. The value

function “ the achieved level of expected utility “ is a result of the utility function and the

constraints.

How could we show this fact? One way would be to try to calculate the value function

by brute force from its de¬nition, equation (9.120). This approach is not fun, and it does

not exploit the beauty of dynamic programming, which is the reduction of an in¬nite period

problem to a two period problem.

Instead solve (9.121) as a functional equation. Guess that the value function V (Wt+1 )

is quadratic, with some unknown parameters. Then use the recursive de¬nition of V (Wt ) in

(9.121), and solve a two period problem“¬nd the optimal consumption choice, plug it into

(9.121) and calculate the value function V (Wt ). If the guess was right, you obtain a quadratic

function for V (Wt ), and determine any free parameters.

Let™s do it. Specify

1

u(ct ) = ’ (ct ’ c— )2 .

2

Guess

γ

V (Wt+1 ) = ’ (Wt+1 ’ W — )2

2

with γ and W — parameters to be determined later. Then the problem (9.121) is (I don™t write

the portfolio choice ± part for simplicity; it doesn™t change anything)

· ¸

1 γ

—2 —2

s. t. Wt+1 = RW (Wt ’ ct ).

V (Wt ) = max ’ (ct ’ c ) ’ β E(Wt+1 ’ W ) t+1

2 2

{ct }

(Et is now E since I assumed i.i.d.) Substituting the constraint into the objective,

· ¸

γ £W ¤2

1

V (Wt ) = max ’ (ct ’ c— )2 ’ β E Rt+1 (Wt ’ ct ) ’ W — (122)

.

2 2

{ct }

The ¬rst order condition with respect to ct , using c to denote the optimal value, is

ˆ

©£ ¤ Wª

ct ’ c— = βγE RW (Wt ’ ct ) ’ W — Rt+1

ˆ ˆ

t+1

Solving for ct ,

ˆ

©£ W 2 ¤ª

ct = c— + βγE Rt+1 Wt ’ ct Rt+1 ’ W — Rt+1

ˆ W2 W

ˆ

152

SECTION 9.1 CAPITAL ASSET PRICING MODEL (CAPM)

£ ¤

ct 1 + βγE(RW 2 ) = c— + βγE(Rt+1 )Wt ’ βγW — E(RW )

W2

ˆ t+1 t+1

c— ’ βγE(RW )W — + βγE(Rt+1 )Wt

W2

t+1

(123)

ct =

ˆ

1 + βγE(RW 2 )

t+1

This is a linear function of Wt . Writing (9.122) in terms of the optimal value of c, we get

γ £W ¤2

1

V (Wt ) = ’ (ˆt ’ c— )2 ’ β E Rt+1 (Wt ’ ct ) ’ W — (124)

c ˆ

2 2

This is a quadratic function of Wt and c. A quadratic function of a linear function is a

ˆ

quadratic function, so the value function is a quadratic function of Wt . If you want to spend

a pleasant few hours doing algebra, plug (9.123) into (9.124), check that the result really is

quadratic in Wt , and determine the coef¬cients γ, W — in terms of fundamental parameters

β, c— , E(RW ), E(RW 2 ) (or σ2 (RW )). The expressions for γ, W — do not give much insight,

so I don™t do the algebra here.

9.1.4 Log utility

Log utility rather than quadratic utility also implies a CAPM. Log utility implies that

consumption is proportional to wealth, allowing us to substitute the wealth return for con-

sumption data.

The point of the CAPM is to avoid the use of consumption data, and so to use wealth

or the rate of return on wealth instead. Log utility is another special case that allows this

substitution. Log utility is much more plausible than quadratic utility.

Suppose that the investor has log utility

u(c) = ln(c).

De¬ne the wealth portfolio as a claim to all future consumption. Then, with log utility, the

price of the wealth portfolio is proportional to consumption itself.

∞ ∞

X X

u0 (ct+j ) ct β

βj 0 βj

pW = Et ct+j = Et ct+j = ct

t

u (ct ) ct+j 1’β

j=1 j=1

The return on the wealth portfolio is proportional to consumption growth,

β

pW + ct+1 1’β + 1 ct+1 1 u0 (ct )

1 ct+1

= t+1 W

W

Rt+1 = = = .

β β u0 (ct+1 )

ct β ct

pt 1’β

153

CHAPTER 9 FACTOR PRICING MODELS

Thus, the log utility discount factor equals the inverse of the wealth portfolio return,

1

(125)

mt+1 = .

RW

t+1

Equation (9.125) could be used by itself: it attains the goal of replacing consumption data

by some other variable. (Brown and Gibbons 1982 test a CAPM in this form.) Note that log

utility is the only assumption so far. We do not assume constant interest rates, i.i.d. returns

or the absence of labor income.

Log utility has a special property that “income effects offset substitution effects,” or in

an asset pricing context that “discount rate effects offset cash¬‚ow effects.” News of higher

consumption = dividend should make the claim to consumption more valuable. However,

through u0 (c) it also raises the discount rate, lowering the value of the claim to consumption.

For log utility, these two effects exactly offset.

9.1.5 Linearizing any model: Taylor approximations and normal distributions.

Any nonlinear model m = f(z) can be turned into a linear model m = a + bz in discrete

time by assuming normal returns.

It is traditional in the CAPM literature to try to derive a linear relation between m and

the wealth portfolio return. We could always do this by a Taylor approximation,

mt+1 ∼ at + bt RW .

= t+1

We can make this approximation exact in a special case, that the factors and all asset returns

are normally distributed. (We can also take the continuous time limit, which is really the

same thing. However, this discrete-time trick is common and useful.) First, I quote without

proof the central mathematical trick as a lemma

Lemma 1 (Stein™s lemma) If f, R are bivariate normal, g(f ) is differentiable and E |

g0 (f) |< ∞, then

cov [g(f), R] = E[g 0 (f )] cov(f, R). (126)

Now we can use the lemma to state the theorem.

Theorem 2 If m = g(f ), if f and a set of the payoffs priced by m are normally distributed

returns, and if |E[g0 (f )]| < ∞, then there is a linear model m = a + bf that prices the

normally distributed returns.

154

SECTION 9.1 CAPITAL ASSET PRICING MODEL (CAPM)

Proof: First, the de¬nition of covariance means that the pricing equation can be

rewritten as a restriction between mean returns and the covariance of returns with

m:

(127)

1 = E(mR) ” 1 = E(m)E(R) + cov(m, R).

Now, given m = g(f), f and R jointly normal, apply Stein™s lemma (9.126) and

(9.127),

1 = E[g(f)]E(R) + E[g0 (f)]cov(f, R)

1 = E[g(f)]E(R) + cov(E[g 0 (f)]f, R)

Exploiting the ⇐ part of (9.127), we know that an m with mean E(g(f )) and that

depends on f via E(g 0 (f ))f will price assets,

m = E[g(f )] + E[g0 (f )][f ’ E(f )].

¥

Using this trick, and recalling that we have not assumed i.i.d. so all these moments are

conditional, the log utility CAPM implies the linear model

"µ ¶2 #

µ ¶

£W ¤

1 1

Rt+1 ’ Et (RW ) (128)

mt+1 = Et ’ Et t+1

W RW

Rt+1 t+1

if RW and all asset returns to be priced are normally distributed. From here it is a short

t+1

step to an expected return-beta representation using the wealth portfolio return as the factor.

In the same way, we can trade the quadratic utility function for normal distributions in the

dynamic programming derivation of the CAPM. Starting from

£ ¤

V 0 RW (Wt ’ ct )

V 0 (Wt+1 ) t+1

mt+1 = β =β

0 (c ) u0 (ct )

ut

we can derive an expression that links m linearly to RW by assuming normality.

t+1

Using the same trick, the consumption-based model can be written in linear fashion, i.e.

expected returns can be expressed as a linear function of betas on consumption growth rather

than betas on consumption growth raised to a power. However, for large risk aversion co-

ef¬cients (more than about 10 in postwar consumption data) or other transformations, the

inaccuracies due to the normal or lognormal approximation can be very signi¬cant in dis-

crete data.

The normal distribution assumption seems rather restrictive, and it is. However, the most

popular class of continuous-time models specify instantaneously normal distributions even

for things like options that have very non-normal distributions for discrete time intervals.

155

CHAPTER 9 FACTOR PRICING MODELS

Therefore, one can think of the Stein™s lemma tricks as a way to get to continuous time

approximations without doing it in continuous time. I demonstrate the explicit continuous

time approach with the ICAPM, in the next section.

9.1.6 Portfolio intuition

The classic derivation of the CAPM contains some useful intuition. The classic derivation

starts with a mean-variance objective for portfolio wealth, max Eu(W ). Beta drives average

returns because beta measures how much adding a bit of the asset to a diversi¬ed portfolio

increases the volatility of the portfolio.

The central insight that started it all is that investors care about portfolio returns, not about

the behavior of speci¬c assets. Once the characteristics of portfolios replaced demand curves

for individual stocks, modern ¬nance was born.

9.2 Intertemporal Capital Asset Pricing Model (ICAPM)

Any “state variable” zt can be a factor. The ICAPM is a linear factor model with wealth

and state variables that forecast changes in the distribution of future returns or income.

The ICAPM generates linear discount factor models

mt+1 = a + b0 ft+1

in which the factors are “state variables” for the investor™s consumption-portfolio decision.

The “state variables” are the variables that determine how well the investor can do in

his maximization. Current wealth is obviously a state variable. Additional state variables

describe the conditional distribution of income and asset returns the agent will face in the

future or “shifts in the investment opportunity set.” In multiple good or international models,

relative price changes are also state variables.

Optimal consumption is a function of the state variables, ct = g(zt ). We can use this fact

once again to substitute out consumption, and write

u0 [g(zt+1 )]

mt+1 =β 0 .

u [g(zt )]

From here, it is a simple linearization to deduce that the state variables zt+1 will be factors.

Alternatively, the value function depends on the state variables

V (Wt+1 , zt+1 ),

156

SECTION 9.2 INTERTEMPORAL CAPITAL ASSET PRICING MODEL (ICAPM)

so we can write

VW (Wt+1 , zt+1 )

mt+1 = β

VW (Wt , zt )

(The marginal value of a dollar must be the same in any use, so I made the denominator pretty

by writing u0 (ct ) = VW (Wt , zt ). This fact is known as the envelope condition.)

This completes the ¬rst step, naming the proxies. To obtain a linear relation, we can take

a Taylor approximation, assume normality and use Stein™s lemma, or, most conveniently,

move to continuous time (which is really just a more convenient way of making the normal

approximation.) We saw above that we can write the basic pricing equation in continuous

time as

µ ¶

dp dΛ dp

f

E ’ r dt = ’E .

p Λp

(for simplicity of the formulas, I™m folding any dividends into the price process). The dis-

count factor is marginal utility, which is the same as the marginal value of wealth,

du0 (ct )

dΛt dVW (Wt , zt )

=0 =

Λt u (ct ) VW

Our objective is to express the model in terms of factors z rather than marginal utility or

value, and Ito™s lemma makes this easy

dVW W VW W dW VW z 1

dz + (second derivative terms)

= +

VW VW W VW 2

(We don™t have to grind out the second derivative terms if we are going to take rf dt =

Et (dΛ/Λ) , though this approach removes a potentially interesting and testable implication

of the model). The elasticity of marginal value with respect to wealth is often called the

coef¬cient of relative risk aversion,

W VW W

rra ≡ ’ .

VW

Substituting, we obtain the ICAPM, which relates expected returns to the covariance of re-

turns with wealth, and also with the other state variables,

µ ¶ µ ¶

dp dW dp VW z dp

f

E ’ r dt = rra E ’ E dz .

p Wp VW p

From here, it is fairly straightforward to express the ICAPM in terms of betas rather than

covariances, or as a linear discount factor model. Most empirical work occurs in discrete

time; we often simply approximate the continuous time result as

E(R) ’ Rf ≈ rra cov(R, ∆W ) + »z cov(R, ∆z).

157

CHAPTER 9 FACTOR PRICING MODELS

One often substitutes covariance with the wealth portfolio for covariance with wealth, and

one uses factor-mimicking portfolios for the other factors dz as well. The factor-mimicking

portfolios are interesting for portfolio advice as well, as they give the purest way of hedging

against or pro¬ting from state variable risk exposure.

This short derivation does not do justice to the beauty of Merton™s portfolio theory and

ICAPM. What remains is to actually state the consumer™s problem and prove that the value

function depends on W and z, the state variables for future investment opportunities, and that

the optimal portfolio holds the market and hedge portfolios for the investment opportunity

variables.

9.3 Comments on the CAPM and ICAPM

Conditional vs. unconditional models.

Do they price options?

Why bother linearizing?

The wealth portfolio.

Ex-post returns.

The implicit consumption-based model.

What are the ICAPM state variables?

CAPM and ICAPM as general equilibrium models

Is the CAPM conditional or unconditional?

Is the CAPM a conditional or an unconditional factor model? I.e., are the parameters a

and b in m = a ’ bRW constants, or do they change at each time period, as conditioning in-

formation changes? We saw above that a conditional CAPM does not imply an unconditional

CAPM, so additional steps must be taken to say anything about observed average returns.

The two period quadratic utility based derivation results in a conditional CAPM, since the

parameters at and bt depend on consumption which changes over time. Also we know that a

and b must vary over time if the conditional moments of RW , Rf vary over time. This two-

period investor chooses a portfolio on the conditional mean variance frontier, which is not on

the unconditional frontier. The multiperiod quadratic utility CAPM only holds if returns are

i.i.d. so it only holds if there is no difference between conditional and unconditional models.

The log utility CAPM expressed with the inverse market return is a beautiful model, since

it holds both conditionally and unconditionally. There are no free parameters that can change

158

SECTION 9.3 COMMENTS ON THE CAPM AND ICAPM

with conditioning information:

µ ¶ µ ¶

1 1

1 = Et Rt+1 ”1=E Rt+1 .

RW RW

t+1 t+1

In fact there are no free parameters at all! Furthermore, the model makes no distributional as-

sumptions, so it can apply to any asset, including options. Finally it requires no speci¬cation

of the investment opportunity set, or (macro language) no speci¬cation of technology.

Linearizing the log utility CAPM comes at enormous price. The expectations in the lin-

earized log utility CAPM (9.128) are conditional. Thus, the apparent simpli¬cation of linear-

ity destroys the nice unconditional feature of the log utility CAPM.

Should the CAPM price options?

As I have derived them, the quadratic utility CAPM and the nonlinear log utility CAPM

should apply to all payoffs: stocks, bonds, options, contingent claims, etc. However, if we as-

sume normal return distributions to obtain a linear CAPM from log utility, we can no longer

hope to price options, since option returns are non-normally distributed (that™s the point of

options!) Even the normal distribution for regular returns is a questionable assumption. You

may hear the statement “the CAPM is not designed to price derivative securities”; the state-

ment refers to the log utility plus normal-distribution derivation of the linear CAPM.

Why linearize?

Why bother linearizing a model? Why take the log utility model m = 1/RW which

should price any asset, and turn it into mt+1 = at + bt Rt+1 that loses the clean conditioning-

W

down property and cannot price non-normally distributed payoffs? These tricks were de-

veloped before the p = E(mx) expression of asset pricing models, when (linear) expected

return-beta models were the only thing around. You need a linear model of m to get an ex-

pected return - beta model. More importantly, the tricks were developed when it was hard to

estimate nonlinear models. It™s clear how to estimate a β and a » by regressions, but estimat-

ing nonlinear models used to be a big headache. Now, GMM has made it easy to estimate and

evaluate nonlinear models. Thus, in my opinion, linearization is mostly intellectual baggage.

The desire for linear representations and this normality trick is one of the central reasons

why many asset pricing models are written in continuous time. In most continuous time

models, everything is locally normal. Unfortunately for empiricists, this approach adds time-

aggregation and another layer of unobservable conditioning information into the predictions

of the model. For this reason, most empirical work is still based on discrete-time models.

However, the local normal distributions in continuous time, even for option returns, is a good

reminder that normal approximations probably aren™t that bad, so long as the time interval is

kept reasonably short.

What about the wealth portfolio?

The log utility derivation makes clear just how expansive is the concept of the wealth

portfolio. To own a (share of) the consumption stream, you have to own not only all stocks,

159

CHAPTER 9 FACTOR PRICING MODELS

but all bonds, real estate, privately held capital, publicly held capital (roads, parks, etc.), and

human capital “ a nice word for “people.” Clearly, the CAPM is a poor defense of common

proxies such as the value-weighted NYSE portfolio. And keep in mind that since it is easy to

¬nd ex-post mean-variance ef¬cient portfolios of any subset of assets (like stocks) out there,

taking the theory seriously is our only guard against ¬shing.

Implicit consumption-based models

Many users of alternative models clearly are motivated by a belief that the consumption-

based model doesn™t work, no matter how well measured consumption might be. This view is

not totally unreasonable; as above, perhaps transactions costs de-link consumption and asset

returns at high frequencies, and some diagnostic evidence suggests that the consumption

behavior necessary to save the consumption model is too wild to be believed.

However, the derivations make clear that the CAPM and ICAPM are not alternatives to

the consumption-based model, they are special cases of that model. In each case mt+1 =

βu0 (ct+1 )/u0 (ct ) still operates. We just added assumptions that allowed us to substitute other

variables in place of ct . One cannot adopt the CAPM on the belief that the consumption

based model is wrong. If you think the consumption-based model is wrong, the economic

justi¬cation for the alternative factor models evaporates.

The only plausible excuse for factor models is a belief that consumption data are un-

satisfactory. However, while asset return data are well measured, it is not obvious that the

S&P500 or other portfolio returns are terri¬c measures of the return to total wealth. “Macro

factors” used by Chen, Roll and Ross (1986) and others are distant proxies for the quanti-

ties they want to measure, and macro factors based on other NIPA aggregates (investment,

output, etc.) suffer from the same measurement problems as aggregate consumption.

In large part, the “better performance” of the CAPM and ICAPM relative to consumption-

based models comes from throwing away content. Again mt+1 = δu0 (ct+1 )/u0 (ct ) is there

in any CAPM or ICAPM. The CAPM and ICAPM make predictions concerning consump-

tion data that are wildly implausible, not only of admittedly poorly measured aggregate con-

sumption data but any imaginable perfectly measured individual consumption data as well.

For example, equation (9.129) says that the standard deviation of the wealth portfolio return

equals the standard deviation of consumption growth. The latter is about 1% per year. All the

miserable failures of the log-utility consumption-based model apply equally to the log util-

ity CAPM. Finally, most models take the market price of risk as a free parameter. Of course

it isn™t; it is related to risk aversion and consumption volatility and is very hard to justify as

such.

Ex-post returns

The log utility model also allows us for the ¬rst time to look at what moves returns ex-post

as well as ex-ante. Recall that, in the log utility model, we have

1 ct+1

RW = (129)

.

t+1

β ct

160

SECTION 9.3 COMMENTS ON THE CAPM AND ICAPM

Thus, the wealth portfolio return is high, ex-post, when consumption is high. This holds at

every frequency: If stocks go up between 12:00 and 1:00, it must be because (on average) we

all decided to have a big lunch. This seems silly. Aggregate consumption and asset returns are

likely to be de-linked at high frequencies, but how high (quarterly?) and by what mechanism

are important questions to be answered. In any case, this is another implication of the log

utility CAPM that is just thrown out.

In sum, the poor performance of the consumption-based model is an important nut to

chew on, not just a blind alley or failed attempt that we can safely disregard and go on about

our business.

Identity of state variables

The ICAPM does not tell us the identity of the state variables zt , and many authors use

the ICAPM as an obligatory citation to theory on the way to using factors composed of

ad-hoc portfolios, leading Fama (1991) to characterize the ICAPM as a “¬shing license.”

The ICAPM really isn™t quite such an expansive license. One could do a lot to insist that the

factor-mimicking portfolios actually are the projections of some identi¬able state variables on

to the space of returns, and one could do a lot to make sure the candidate state variables really

are plausible state variables for an explicitly stated optimization problem. For example, one

could check that investment-opportunity set state variables actually do forecast something.

The ¬shing license comes as much from habits of applying the theory as from the theory

itself.

General equilibrium models

The CAPM and other models are really general equilibrium models. Looking at the

derivation through general-equilibrium glasses, we have speci¬ed a set of linear technologies

with returns Ri that do not depend on the amount invested. Some derivations make further

assumptions, such as an initial capital stock, and no labor or labor income.

The CAPM is obviously very arti¬cial. Its central place really comes from its long string

of empirical successes rather than its theoretical purity. The theory was extended and multiple

factors anticipated long before they became empirically popular.

Portfolio intuition

I have derived all the models as instances of the consumption-based model. The more tra-

ditional portfolio intuition for multifactor models is also useful. The intuition (and historical

development) comes from looking past consumption to its determinants in sources of income

or news.

The CAPM simpli¬es matters by assuming that the average investor only cares the per-

formance of his investment portfolio. Most of us have jobs, so events like recessions hurt the

majority of investors. People with jobs will prefer stocks that don™t fall in recessions, even if

their market betas, mean returns, and standard deviations are the same as stocks that do fall

in recessions. Demanding such stocks, they drive down the corresponding expected returns.

Thus, we expect expected returns to depend on additional betas that capture labor market

161

CHAPTER 9 FACTOR PRICING MODELS

conditions.

The traditional ICAPM intuition works the same way. Even jobless investors have long

horizons. Thus, they will prefer stocks that do well when news comes that future returns are

lower. Demanding more of such stocks, they depress expected returns. Thus, expected re-

turns come to depend on covariation with news of future returns, not just covariation with the

current market return. The ICAPM remained on the theoretical shelf for 20 years mostly be-

cause it took that long to accumulate empirical evidence that returns are, in fact, predictable.

It is vitally important that the extra factors affect the average investor. If an event makes

investor A worse off and investor B better off, then investor A buys assets that do well when

the event happens, and investor B sells them. They transfer the risk of the event, but the

price or expected return of the asset is unaffected. For a factor to affect prices or expected

returns, the average investor must be affected by it, so investors collectively bid up or down

the price and expected return of assets that covary with the event rather than just transfer the

risk without affecting equilibrium prices.

As you can see, this traditional intuition is encompassed by consumption. Bad labor

market outcomes or bad news about future returns are bad news that raise the marginal utility

of wealth, which equals the marginal utility of consumption.

9.4 Arbitrage Pricing Theory (APT)

The APT: If a set of asset returns are generated by a linear factor model

N

X

˜

i i

β ij fj + µi

R = E(R ) +

j=1

˜

E(µi ) = E(µi fj ) = 0.

Then (with additional assumptions) there is a discount factor m linear in the factors m =

a + b0 f that prices the returns.

The APT starts from a statistical characterization. There is a big common component

to stock returns: when the market goes up, most individual stocks also go up. Beyond the

market, groups of stocks move together such as computer stocks, utilities, small stocks, value

stocks and so forth. Finally, each stock™s return has some completely idiosyncratic movement.

This is a characterization of realized returns, outcomes or payoffs. The point of the APT is to

start with this statistical characterization of outcomes, and derive something about expected

returns or prices.

The intuition behind the APT is that the completely idiosyncratic movements in asset

162

SECTION 9.4 ARBITRAGE PRICING THEORY (APT)

returns should not carry any risk prices, since investors can diversify them away by holding

portfolios. Therefore, risk prices or expected returns on a security should be related to the

security™s covariance with the common components or “factors” only.

The job of this section is then 1) to describe a mathematical model of the tendency for

stocks to move together, and thus to de¬ne the “factors” and residual idiosyncratic compo-

nents, and 2) to think carefully about what it takes for the idiosyncratic components to have

zero (or small) risk prices, so that only the common components matter to asset pricing.

There are two lines of attack for the second item. 1) If there were no residual, then we

could price securities from the factors by arbitrage (really, by the law of one price, but the

current distinction between law of one price and arbitrage came after the APT was named.)

Perhaps we can extend this logic and show that if the residuals are small, they must have

small risk prices. 2) If investors all hold well-diversi¬ed portfolios, then only variations in

the factors drive consumption and hence marginal utility.

Much of the original appeal and marketing of the APT came from the ¬rst line of attack,

the idea that we could derive pricing implications without the economic structure required

of the CAPM, ICAPM, or any other model derived as a specialization of the consumption-

based model. In this section, I will ¬rst try to see how far we can in fact get with purely law

of one price arguments. I will conclude that the answer is, “not very far,” and that the most

satisfactory argument for the APT is in fact just another specialization of the consumption-

based model.

9.4.1 Factor structure in covariance matrices

I de¬ne and examine the factor decomposition

xi = ±i + β 0 f + µi ; E(µi ) = 0, E(fµi ) = 0

i

The factor decomposition is equivalent to a restriction on the payoff covariance matrix.

The APT models the tendency of asset payoffs (returns) to move together via a statistical

factor decomposition

M

X

β ij fj + µi = ±i + β 0 f + µi .

i

(130)

x = ±i + i

j=1

The fj are the factors, the β ij are the betas or factor loadings and the µi are residuals.

£ usual, I use the ¤0

As same letter without subscripts to denote a vector, for example f =

f1 f2 ... fK . A discount factor m, pricing factors f in m = b0 f and this factor

decomposition (or factor structure) for returns are totally unrelated uses of the word “factor.”

163

CHAPTER 9 FACTOR PRICING MODELS

I didn™t invent the terminology! The APT is conventionally written with xi = returns, but it

ends up being much less confusing to use prices and payoffs.

It is a convenient and conventional simpli¬cation to fold the factor means into the ¬rst,

˜

constant, factor and write the factor decomposition with zero-mean factors f ≡ f ’ E(f).

M

X

˜

i i

β ij fj + µi . (131)

x = E(x ) +

j=1

Remember that E(xi ) is still just a statistical characterization, not a prediction of a model.

We can construct the factor decomposition as a regression equation. De¬ne the β ij as

regression coef¬cients, and then the µi are uncorrelated with the factors by construction,

˜

E(µi fj ) = 0.

The content ” the assumption that keeps (9.131) from describing any arbitrary set of returns

” is an assumption that the µi are uncorrelated with each other.

E(µi µj ) = 0.

(More general versions of the model allow some limited correlation across the residuals but

the basic story is the same.)

The factor structure is thus a restriction on the covariance matrix of payoffs. For example,

if there is only one factor, then

½2

σµi if i = j

˜ ˜

cov(xi , xj ) = E[(β i f + µi )(β j f + µj )] = β i β j σ2 (f) + .

0 if i 6= j

Thus, with N = number of securities, the N (N ’ 1)/2 elements of a variance-covariance

matrix are described by N betas, and N + 1 variances. A vector version of the same thing is

®2

σ1 0 0

2

cov(x, x0 ) = ββ 0 σ 2 (f ) + ° 0 σ 2 0 » .

..

.

0 0

With multiple (orthogonalized) factors, we obtain

cov(x, x0 ) = β 1 β 0 σ2 (f1 ) + β 2 β 0 σ2 (f2 ) + . . . + (diagonal matrix)

1 2

In all these cases, we describe the covariance matrix a singular matrix ββ 0 (or a sum of a few

such singular matrices) plus a diagonal matrix.

If we know the factors we want to use ahead of time, say the market (value-weighted

portfolio) and industry portfolios, or size and book to market portfolios, we can estimate

a factor structure by running regressions. Often, however, we don™t know the identities of

the factor portfolios ahead of time. In this case we have to use one of several statistical

164

SECTION 9.4 ARBITRAGE PRICING THEORY (APT)

techniques under the broad heading of factor analysis (that™s where the word “factor” came

from in this context) to estimate the factor model. One can estimate a factor structure quickly

by simply taking an eigenvalue decomposition of the covariance matrix, and then setting

small eigenvalues to zero.

9.4.2 Exact factor pricing

With no error term,

˜

xi = E(xi )1 + β 0 f .

i

implies

˜

p(xi ) = E(xi )p(1) + β 0 p(f )

i

and thus

m = a + b0 f ; p(xi ) = E(mxi )

E(Ri ) = Rf + β 0 ».

i

using only the law of one price.

Suppose that there are no idiosyncratic terms µi . This is called an exact factor model.

Now look again at the factor decomposition,

˜

xi = E(xi )1 + β 0 f . (132)

i

It started as a statistical decomposition. But it also says that the payoff xi can be synthesized

as a portfolio of the factors and a constant (risk-free payoff). Thus, the price of xi can only

depend on the prices of the factors f,

˜

p(xi ) = E(xi )p(1) + β 0 p(f). (133)

i

The law of one price assumption lets you take prices of right and left sides.

If the factors are returns, their prices are 1. If the factors are not returns, their prices are

free parameters which can be picked to make the model ¬t as well as possible. Since there

are fewer factors than payoffs, this procedure is not vacuous. (Recall that the prices of the

factors are related to the » in expected return beta representations. » is determined by the

expected return of a return factor, and is a free parameter for non-return factor models.)

We are really done, but the APT is usually stated as “there is a discount factor linear

in f that prices returns Ri ,” or “there is an expected return-beta representation with f as

165

CHAPTER 9 FACTOR PRICING MODELS

factors.” Therefore, we should take a minute to show that the rather obvious relationship

(9.133) between prices is equivalent to discount factor and expected return statements.

Assuming only the law of one price, we know there is a discount factor m linear in

factors that prices the factors. We usually call it x— , but call it f — here to remind us that it

£ ¤

˜ 0 the factors including the constant. As with x— ,

ˆ

prices the factors. Denote f = 1 f

ˆ ˆˆ ˆ ˆ ˆ

f — = p(f )0 E(f f 0 )’1 f = a + b0 f satis¬es p(f ) = E(f — f ) and p(1) = E(f — ). If the

discount factor prices the factors, it must price any portfolio of the factors; hence f — prices

all payoffs xi that follow the factor structure (9.132).

We could now go from m linear in the factors to an expected return-beta model using the

above theorems that connect the two representations. But there is a more direct and elegant

connection. Start with (9.133), specialized to returns xi = Ri and of course p(Ri ) = 1. Use

p(1) = 1/Rf and solve for expected return as

h i

˜ = Rf + β 0 ».

0

i f f

E(R ) = R + β i ’R p(f) i

The last equality de¬nes ». Expected returns are linear in the betas, and the constants (») are

related to the prices of the factors. In fact, this is the same de¬nition of » that we arrived at

above connecting m = b0 f to expected return-beta models.

9.4.3 Approximate APT using the law of one price

Attempts to extend the exact factor model to an approximate factor pricing model when

errors are “small,” or markets are “large,” still only using law of one price.

For ¬xed m, the APT gets better and better as R2 or the number of assets increases.

However, for any ¬xed R2 or size of market, the APT can be arbitrarily bad.

These observations mean that we must go beyond the law of one price to derive factor

pricing models.

Actual returns do not display an exact factor structure. There is some idiosyncratic or

residual risk; we cannot exactly replicate the return of a given stock with a portfolio of a few

large factor portfolios. However, the idiosyncratic risks are often small. For example, factor

model regressions of the form (9.130) often have very high R2 , especially when portfolios

rather than individual securities are on the left hand side. And the residual risks are still

idiosyncratic: Even if they are a large part of an individual security™s variance, they should

be a small contributor to the variance of well diversi¬ed portfolios. Thus, there is reason to

hope that the APT holds approximately, especially for reasonably large portfolios. Surely, if

the residuals are “small” and/or “idiosyncratic,” the price of an asset can™t be “too different”

from the price predicted from its factor content?

166

SECTION 9.4 ARBITRAGE PRICING THEORY (APT)

To think about these issues, start again from a factor structure, but this time put in a

residual,

˜

xi = E(xi )1 + β 0 f + µi

i

Again take prices of both sides,

˜

p(xi ) = E(xi )p(1) + β 0 p(f ) + E(mµi )

i

Now, what can we say about the price of the residual p(µi ) = E(mµi )?

Figure 23 illustrates the situation. Portfolios of the factors span a payoff space, the line

from the origin through β 0 f in the Figure. The payoff we want to price, xi is not in that space,

i

since the residual µi is not zero. A discount factor f — that is in the f payoff space prices the

factors. The set of all discount factors that price the factors is the line m perpendicular to

f — . The residual µi is orthogonal to the factor space, since it is a regression residual, and to

f — in particular, E(f — µi ) = 0. This means that f — assigns zero price to the residual. But the

other discount factors on the m line are not orthogonal to µi , so generate non-zero price for

the residual µi . As we sweep along the line of discount factors m that price the f, in fact, we

generate every price from ’∞ to ∞ for the residual. Thus, the law of one price does not nail

down the price of the residual µi and hence the price or expected return of xi .

All m

β™if

m>0

µi

m: σ2(m) < A

xi

f*

m

Figure 23. Approximate arbitrage pricing.

167

CHAPTER 9 FACTOR PRICING MODELS

Limiting arguments

We would like to show that the price of xi has to be “close to” the price ofβ 0 f . One notion

i

of “close to” is that in some appropriate limit the price of xi converges to the price of β 0 f.

i

“Limit” means, of course, that you can get arbitrarily good accuracy by going far enough in

the direction of the limit (for every µ > 0 there is a δ....). Thus, establishing a limit result is

a way to argue for an approximation.

Here is one theorem that seems to imply that the APT should be a good approximation

for portfolios that have high R2 on the factors. I state the argument for the case that there is a

constant factor, so the constant is in the f space and E(µi ) = 0. The same ideas work in the

less usual case that there is no constant factor, using second moments in place of variance.

Theorem: Fix a discount factor m that prices the factors. Then, as var(µi ) ’ 0,

p(xi ) ’ p(β 0 f ).

i

This is easiest to see by just looking at the graph. E(µi ) = 0 so var(µi ) = E(µi2 ) =

||µi ||2 . Thus, as the size of the µi vector in Figure 23 gets smaller, xi gets closer and closer to

β 0 f. For any ¬xed m, the induced pricing function (lines perpendicular to the chosen m) is

i

continuous. Thus, as xi gets closer and closer to β 0 f , its price gets closer and closer to β 0 f.

i i

The factor model is de¬ned as a regression, so

var(xi ) = var(β 0 f ) + var(µi )

i

Thus, the variance of the residual is related to the regression R2 .

var(µi )

= 1 ’ R2

i)

var(x

The theorem says that as R2 ’ 1, the price of the residual goes to zero.

We were hoping for some connection between the fact that the risks are idiosyncratic and

factor pricing. Even if the idiosyncratic risks are a large part of the payoff at hand, they

are a small part of a well-diversi¬ed portfolio. The next theorem shows that portfolios with

high R2 don™t have to happen by chance; well-diversi¬ed portfolios will always have this

characteristic.

Theorem: As the number of primitive assets increases, the R2 of well-diversi¬ed

portfolios increases to 1.

Proof: Start with an equally weighted portfolio

N

1X i

p

x= x.

N i=1

Going back to the factor decomposition (9.130) for each individual asset xi , the

168

SECTION 9.4 ARBITRAGE PRICING THEORY (APT)

factor decomposition of xp is

N N N N

1 X¡ 1X 1X 0 1Xi

¢ 0

0

p i

µ = ap + β p f + µp .

x= ai + β i f + µ = ai + β if +

N i=1 N i=1 N i=1 N i=1

The last equality de¬nes notation ±p , β p , µp . But

Ã !

N

1Xi

var(µp ) = var µ

N i=1

So long as the variance of µi are bounded, and given the factor assumption E(µi µj ) =

0,

lim var(µp ) = 0.

N’∞

Obviously, the same idea goes through so long as the portfolio spreads some weight

¥

on all the new assets, i.e. so long as it is “well-diversi¬ed.”

These two theorems can be interpreted to say that the APT holds approximately (in the

usual limiting sense) for portfolios that either naturally have high R2 , or well-diversi¬ed

portfolios in large enough markets. We have only used the law of one price.

Law of one price arguments fail

Now, let me pour some cold water on these results. I ¬xed m and then let other things take

limits. The ¬‚ip side is that for any nonzero residual µi , no matter how small, we can pick a

discount factor m that prices the factors and assigns any price to xi ! As often in mathematics,

the order of “for all” and “there exists” matters a lot.

Theorem: For any nonzero residual µi there is a discount factor that prices the fac-

tors f (consistent with the law of one price) and that assigns any desired price in

(’∞, ∞) to the payoff xi .

So long as ||µi || > 0, as we sweep the choice of m along the dashed line, the inner

product of m with µi and hence xi varies from ’∞ to ∞. Thus, for a given size R2 < 1, or

a given ¬nite market, the law of one price says absolutely nothing about the prices of payoffs

that do not exactly follow the factor structure. The law of one price says that two ways of

constructing the same portfolio must give the same price. If the residual is not exactly zero,

there is no way of replicating the payoff xi from the factors and no way to infer anything

about the price of xi from the price of the factors.

I think the contrast between this theorem and those of the last subsection accounts for

most of the huge theoretical controversy over the APT. If you ¬x m and take limits of N or

169

CHAPTER 9 FACTOR PRICING MODELS

µ, the APT gets arbitrarily good. But if you ¬x N or µ, as one does in any application, the

APT can get arbitrarily bad as you search over possible m.

The lesson I learn is that the effort to extend prices from an original set of securities (f in

this case) to new payoffs that are not exactly spanned by the original set of securities, using

only the law of one price, is fundamentally doomed. To extend a pricing function, you need

to add some restrictions beyond the law of one price.

9.4.4 Beyond the law of one price: arbitrage and Sharpe ratios

We can ¬nd a well-behaved approximate APT if we impose the law of one price and a

restriction on the volatility of discount factors, or, equivalently, a bound on the Sharpe ratio

achievable by portfolios of the factors and test assets.

The approximate APT based on the law of one price fell apart because we could always

choose a discount factor suf¬ciently “far out” to generate an arbitrarily large price for an

arbitrarily small residual. But those discount factors are surely “unreasonable.” Surely, we

can rule them out, reestablishing an approximate APT, without jumping all the way to fully

speci¬ed discount factor models such as the CAPM or consumption-based model

A natural ¬rst idea is to impose the no-arbitrage restriction that m must be positive.

Graphically, we are now restricted to the solid m line in Figure 23. Since that line only

extends a ¬nite amount, restricting us to strictly positive m0 s gives rise to ¬nite upper and

lower arbitrage bounds on the price of µi and hence xi . (The word arbitrage bounds comes

from option pricing, and we will see these ideas again in that context. If this idea worked, it

would restore the APT to “arbitrage pricing” rather than “law of one-pricing.”)

Alas, in applications of the APT (as often in option pricing), the arbitrage bounds are

too wide to be of much use. The positive discount factor restriction is equivalent to saying

“if portfolio A gives a higher payoff than portfolio B in every state of nature, then the price

of A must be higher than the price of B.” Since stock returns and factors are continuously

distributed, not two-state distributions as I have graphed for ¬gure 23, there typically are no

strictly dominating portfolios, so adding m > 0 does not help.

A second restriction does let us derive an approximate APT that is useful in ¬nite markets

with R2 < 1. We can restrict the variance and hence the size (||m|| = E(m2 ) = σ 2 (m) +

E(m)2 = σ 2 (m) + 1/Rf2 ) of the discount factor. Figure 23 includes a plot of the discount

factors with limited variance, size, or length in the geometry of that Figure. The restricted

range of discount factors produces a restricted range of prices for xi . The restricted range

of discount factors gives us upper and lower price bounds for the price of xi in terms of the

170

SECTION 9.5 APT VS. ICAPM

factor prices. Precisely, the upper and lower bounds solve the problem

min ( or max) p(xi ) = E(mxi ) s.t. E(mf ) = p(f), m ≥ 0, σ2 (m) ¤ A.

{m} {m}

Limiting the variance of the discount factor is of course the same as limiting the maximum

Sharpe ratio (mean / standard deviation of excess return) available from portfolios of the

factors and xi . Recall that

E (Re ) σ(m)

¤ .

σ(Re ) E(m)

Though a bound on Sharpe ratios or discount factor volatility is not a totally preference-

free concept, it clearly imposes a great deal less structure than the CAPM or ICAPM which

are essentially full general equilibrium models. Ross (1976) included this suggestion in his

original APT paper, though it seems to have disappeared from the literature since then in

the failed effort to derive an APT from the law of one price alone. Ross pointed out that

deviations from factor pricing could provide very high Sharpe ratio opportunities, which

seem implausible though not violations of the law of one price. Saá-Requejo and I (2000)

dub this idea “good-deal” pricing, as an extension of “arbitrage pricing.” Limiting σ(m) rules

out “good deals” as well as pure arbitrage opportunities.

Having imposed a limit on discount factor volatility or Sharpe ratio A, then the APT limit

does work, and does not depend on the order of “for all” and “there exists.”

Theorem: As µi ’ 0 and R2 ’ 1, the price p(xi ) assigned by any discount factor

m that satis¬es E(mf) = p(f ), m ≥ 0, σ 2 (m) ¤ A approaches p(β 0 f ).

i

9.5 APT vs. ICAPM

A factor structure in the covariance of returns or high R2 in regressions of returns on

factors can imply factor pricing (APT) but factors can price returns without describing their

covariance matrix (ICAPM).

Differing inspiration for factors.

The disappearance of absolute pricing.

The APT and ICAPM stories are often confused. Factor structure can imply factor pric-

ing (APT), but factor pricing does not require a factor structure. In the ICAPM there is no

presumption that factors f in a pricing model m = b0 f describe the covariance matrix of

returns. The factors don™t have to be orthogonal or i.i.d. either. High R2 in time-series re-

gressions of the returns on the factors may imply factor pricing (APT), but again are not

171

CHAPTER 9 FACTOR PRICING MODELS

necessary (ICAPM). The regressions of returns on factors can have low R2 in the ICAPM.

Factors such as industry may describe large parts of returns™ variances but not contribute to

the explanation of average returns.

The biggest difference between APT and ICAPM for empirical work is in the inspiration

for factors. The APT suggests that one start with a statistical analysis of the covariance matrix

of returns and ¬nd portfolios that characterize common movement. The ICAPM suggests that

one start by thinking about state variables that describe the conditional distribution of future

asset returns and non-asset income. More generally, the idea of proxying for marginal utility

growth suggests macroeconomic indicators, and indicators of shocks to non-asset income in

particular.

The difference between the derivations of factor pricing models, and in particular an ap-

proximate law-of-one-price basis vs. a proxy for marginal utility basis seems not to have

had much impact on practice. In practice, we just test models m = b0 f and rarely worry

about derivations. The best evidence for this view is the introductions of famous papers.

Chen, Roll and Ross (1986) describe one of the earliest popular multifactor models, using

industrial production and in¬‚ation as some of the main factors. They do not even present

a factor decomposition of test asset returns, or the time-series regressions. A reader might

well categorize the paper as a macroeconomic factor model or perhaps an ICAPM. Fama and

French (1993) describe the currently most popular multifactor model, and their introduction

describes it as an ICAPM in which the factors are state variables. But the factors are sorted

on size and book/market just like the test assets, the time-series R2 are all above 90%, and

much of the explanation involves “common movement” in test assets captured by the factors.

A a reader might well categorize the model as much closer to an APT.

In the ¬rst chapter, I made a distinction between relative pricing and absolute pricing. In

the former, we price one security given the prices of others, while in the latter, we price each

security by reference to fundamental sources of risk. The factor pricing stories are interesting

in that they start with a nice absolute pricing model, the consumption-based model, and

throw out enough information to end up with relative models. The CAPM prices Ri given

the market, but throws out the consumption-based model™s description of where the market

return came from.

9.6 Problems

1. Suppose the investor only has a one-period horizon. He invests wealth W at date zero,

and only consumes with expected utility Eu(c) = Eu(W ) in period one. Derive the

quadratic utility CAPM in this case. (This is an even simpler derivation. The Lagrange

multiplier on initial wealth W now becomes the denominator of m in place of u0 (c0 )).

2. Express the log utility CAPM in continuous time to derive a discount factor linear in

wealth.

3. Figure 23 suggests that m > 0 is enough to establish a well-behaved approximate APT.

172

SECTION 9.6 PROBLEMS

The text claims this is not true. Which is right?

4. Can you use any excess return for the market factor in the CAPM, or must it be the

market less the riskfree rate?

173

PART II

Estimating and evaluating asset

pricing models

174

SECTION 9.6 PROBLEMS

Our ¬rst task in bringing an asset pricing model to data is to estimate the free parameters; the

β and γ in m = β(ct+1 /ct )’γ , or the b in m = b0 f. Then we want to evaluate the model. Is

it a good model or not? Is another model better?

Statistical analysis helps us to evaluate a model by providing a distribution theory for

numbers such as parameter estimates that we create from the data. A distribution theory

pursues the following idea: Suppose that we generate arti¬cial data over and over again from

a statistical model. For example, we could specify that the market return is an i.i.d. normal

random variable, and a set of stock returns is generated by Rt = ±i + β i Rem + µi . After

ei

t t

picking values for the mean and variance of the market return and the ±i , β i , σ 2 (µi ), we could

ask a computer to simulate many arti¬cial data sets. We can repeat our statistical procedure

in each of these arti¬cial data sets, and graph the distribution of any statistic which we have

estimated from the real data, i.e. the frequency that it takes on any particular value in our

arti¬cial data sets.

In particular, we are interested in a distribution theory for the estimated parameters, to give

us some sense of how much the data really has to say about their values; and for the pricing

errors, which helps us to judge whether pricing errors are just bad luck of one particular

historical accident or if they indicate a failure of the model. We also will want to generate

distributions for statistics that compare one model to another, or provide other interesting

evidence, to judge how much sample luck affects those calculations.

All of the statistical methods I discuss in this part achieve these ends. They give methods

for estimating free parameters; they provide a distribution theory for those parameters, and

they provide distributions for statistics that we can use to evaluate models, most often a

quadratic form of pricing errors in the form ±0 V ’1 ±.

ˆ ˆ

I start by focusing on the GMM approach. The GMM approach is a natural ¬t for a

discount factor formulation of asset pricing theories, since we just use sample moments in

the place of population moments. As you will see, there is no singular “GMM estimate and

test.” GMM is a large canvas and a big set of paints and brushes; a ¬‚exible tool for doing

all kinds of sensible (and, unless you™re careful, not-so-sensible) things to the data. Then

I consider traditional regression tests (naturally paired with expected return-beta statements

of factor models) and their maximum likelihood formalization. I emphasize the fundamental

similarities between these three methods, as I emphasized the similarity between p = E(mx),

175

CHAPTER 9 FACTOR PRICING MODELS

expected return-beta models, and mean-variance frontiers. A concluding chapter highlights

some of the differences between the methods, as I contrasted p = E(mx) and beta or mean-

variance representations of the models.

176

Chapter 10. GMM in explicit discount

factor models

The basic idea in the GMM approach is very straightforward. The asset pricing model pre-

dicts

E(pt ) = E [m(datat+1 , parameters) xt+1 ] . (134)

The most natural way to check this prediction is to examine sample averages, i.e. to calculate

T T

1X 1X

pt and [m(datat+1 , parameters) xt+1 ] . (135)

T t=1 T t=1

GMM estimates the parameters by making the sample averages as close to each other as

possible. It seems natural, before evaluating a model, to pick parameters that give it its best

chance. GMM then works out a distribution theory for the estimates. This distribution theory

is a generalization of the simplest exercise in statistics: the distribution of the sample mean.

Then, it suggests that we evaluate the model by looking at how close the sample averages

of price and discounted payoff are to each other, or equivalently by looking at the pricing

errors. It gives a statistical test of the hypothesis that the underlying population means are in

fact zero.