<< стр. 9(всего 17)СОДЕРЖАНИЕ >>
T T
X X
Л†= 1 Л† t ; О±i = 1
О» О»Л† О±it .
Л†
T t=1 T t=1

228
SECTION 12.3 FAMA-MACBETH PROCEDURE

Most importantly, they suggest that we use the standard deviations of the cross-sectional
regression estimates to generate the sampling errors for these estimates,

XВі Вґ2
T T
X
Л†=1 Л† t в€’ О» ; Пѓ2 (Л† i ) = 1 (Л† it в€’ О±i )2 .
Л†
2
Пѓ (О») О» О± О± Л†
2 2
T t=1 T t=1

ItвЂ™s 1/T 2 because weвЂ™re п¬Ѓnding standard errors of sample means, Пѓ2 /T
This is an intuitively appealing procedure once you stop to think about it. Sampling error
is, after all, about how a statistic would vary from one sample to the next if we repeated the
observations. We canвЂ™t do that with only one sample, but why not cut the sample in half, and
deduce how a statistic would vary from one full sample to the next from how it varies from
the п¬Ѓrst half of the sample to the next half? Proceeding, why not cut the sample in fourths,
eights and so on? The Fama-MacBeth procedure carries this idea to is logical conclusion,
Л†
using the variation in the statistic О»t over time to deduce its sampling variation.
We are used to deducing the sampling variance of the sample mean of a series xt by
looking at the variation of xt through time in the sample, using Пѓ 2 (ВЇ) = Пѓ2 (x)/T =
x
P 2
t (xt в€’ x) . The Fama-MacBeth technique just applies this idea to the slope and pric-
1
ВЇ
T2
ing error estimates. The formula assumes that the time series is not autocorrelated, but one
Л†
could easily extend the idea to estimates О»t that are correlated over time by using a long run
variance matrix, i.e. estimate .
в€ћ
X
Л† =1 Л†Л†
2
Пѓ (О») covT (О»t , О»tв€’j )
T j=в€’в€ћ

One should of course use some sort of weighting matrix or a parametric description of the
Л†
autocorrelations of О», as explained in section 11.7. Asset return data are usually not highly
correlated, but accounting for such correlation could have a big effect on the application
of the Fama-MacBeth technique to corporate п¬Ѓnance data or other regressions in which the
cross-sectional estimates are highly correlated over time.
It is natural to use this sampling theory to test whether all the pricing errors are jointly
zero as we have before. Denote by О± the vector of pricing errors across assets. We could
estimate the covariance matrix of the sample pricing errors by
T
1X
О±=
Л† О±t
Л†
T t=1
T
1X
(Л† t в€’ О±) (Л† t в€’ О±)0
cov(Л† ) =
О± О± Л†О± Л†
2
T t=1

(or a general version that accounts for correlation over time) and then use the test

О±0 cov(Л† )в€’1 О± в€ј П‡2 .
Л† О± Л† Nв€’1

229
CHAPTER 12 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

12.3.1 Fama MacBeth in depth

The GRS procedure and the formulas given above for a single cross-sectional regression are
familiar from any course in regression. We will see them justiп¬Ѓed by maximum likelihood
below. The Fama MacBeth procedure seems unlike anything youвЂ™ve seen in any econometrics
course, and it is obviously a useful and simple technique that can be widely used in panel
data in economics and corporate п¬Ѓnance as well as asset pricing. Is it truly different? Is there
something different about asset pricing data that requires a fundamentally new technique not
taught in standard regression courses? Or is it similar to standard techniques? To answer these
questions it is worth looking in a little more detail at what it accomplishes and why.
ItвЂ™s easier to do this in a more standard setup, with left hand variable y and right hand
variable x. Consider a regression

yit = ОІ 0 xit + Оµit i = 1, 2, ...N; t = 1, 2, ...T.

The data in this regression has a cross-sectional element as well as a time-series element.
In corporate п¬Ѓnance, for example, one might be interested in the relationship between in-
vestment and п¬Ѓnancial variables, and the data set has many п¬Ѓrms (N ) as well as time series
observations for each п¬Ѓrm (T ). In and expected return-beta asset pricing model, the xit stand-
ing for the ОІ i and ОІ stands for О».
An obvious thing to do in this context is simply to stack the i and t observations together
and estimate ОІ by OLS. I will call this the pooled time-series cross-section estimate. How-
ever, the error terms are not likely to be uncorrelated with each other. In particular, the error
terms are likely to be cross-sectionally correlated at a given time. If one stockвЂ™s return is un-
usually high this month, another stockвЂ™s return is also likely to be high; if one п¬Ѓrm invests an
unusually great amount this year, another п¬Ѓrm is also likely to do so. When errors are corre-
lated, OLS is still consistent, but the OLS distribution theory is wrong, and typically suggests
standard errors that are much too small. In the extreme case that the N errors are perfectly
correlated at each time period, there really only one observation for each time period, so one
really has T rather than NT observations. Therefore, a real pooled time-series cross-section
estimate must include corrected standard errors. People often ignore this fact and report OLS
standard errors.
Another thing we could do is п¬Ѓrst take time series averages and then run a pure cross-
sectional regression of

ET (yit ) = ОІ 0 ET (xit ) + ui i = 1, 2, ...N

This procedure would lose any information due to variation of the xit over time, but at least
it might be easier to п¬Ѓgure out a variance-covariance matrix for ui and correct the standard
errors for residual correlation. (You could also average cross-sectionally and than run a single
time-series regression. WeвЂ™ll get to that option later.)
In either case, the standard error corrections are just applications of the standard formula

230
SECTION 12.3 FAMA-MACBETH PROCEDURE

for OLS regressions with correlated error terms.
Finally, we could run the Fama-MacBeth procedure: run a cross-sectional regression at
Л† Л†
each point in time; average the cross-sectional ОІ t estimates to get an estimate ОІ, and use the
Л† Л†
time-series standard deviation of ОІ t to estimate the standard error of ОІ.
It turns out that the Fama MacBeth procedure is just another way of calculating the stan-
dard errors, corrected for cross-sectional correlation:
Proposition: If the xit variables do not vary over time, and if the errors are cross-sectionally
correlated but not correlated over time, then the Fama-MacBeth estimate, the pure cross-
sectional OLS estimate and the pooled time-series cross-sectional OLS estimates are identi-
cal. Also, the Fama-MacBeth standard errors are identical to the cross-sectional regression or
stacked OLS standard errors, corrected for residual correlation. None of these relations hold
if the x vary through time.
Since they are identical procedures, whether one calculates estimates and standard errors
in one way or the other is a matter of taste.
I emphasize one procedure that is incorrect: pooled time series and cross section OLS with
no correction of the standard errors. The errors are so highly cross-sectionally correlated in
most п¬Ѓnance applications that the standard errors so computed are often off by a factor of 10.
The assumption that the errors are not correlated over time is probably not so bad for
asset pricing applications, since returns are close to independent. However, when pooled
time-series cross-section regressions are used in corporate п¬Ѓnance applications, errors are
likely to be as severely correlated over time as across п¬Ѓrms, if not more so. The вЂњother
factorsвЂќ (Оµ) that cause, say, company i to invest more at time t than predicted by a set of right
hand variables is surely correlated with the other factors that cause company j to invest more.
But such factors are especially likely to cause company i to invest more tomorrow as well. In
this case, any standard errors must also correct for serial correlation in the errors; the GMM
based formulas in section 11.4 can do this easily.
Л†
The Fama-MacBeth standard errors also do not correct for the fact that ОІ are generated
regressors. If one is going to use them, it is a good idea to at least calculate the Shanken
correction factors outlined above, and check that the corrections are not large.
Proof: We just have to write out the three approaches and compare them. Having assumed
that the x variables do not vary over time, the regression is

yit = x0 ОІ + Оµit .
i

We can stack up the cross-sections i = 1...N and write the regression as

yt = xОІ + Оµt .

x is now a matrix with the x0 as rows. The error assumptions mean E(Оµt Оµ0 ) = ОЈ.
t
i

231
CHAPTER 12 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

Pooled OLS: To run pooled OLS, we stack the time series and cross sections by writing
пЈ® пЈ№ пЈ® пЈ№ пЈ® пЈ№
y1 x Оµ1
пЈЇ y2 пЈє пЈЇxпЈє пЈЇ Оµ2 пЈє
пЈЇ пЈє пЈЇ пЈє пЈЇ пЈє
Y =пЈЇ . пЈє; X =пЈЇ . пЈє; ВІ =пЈЇ . пЈє
.пЈ» .пЈ» .пЈ»
пЈ°. пЈ°. пЈ°.
yT x ОµT
and then
Y = XОІ + ВІ
with
пЈ® пЈ№
ОЈ
пЈЇ пЈє
..
E(ВІВІ0 ) = в„¦ = пЈ° пЈ»
.
ОЈ
The estimate and its standard error are then
в€’1
Л†
ОІ OLS = (X 0 X) X 0 Y
в€’1 в€’1
Л†
cov(ОІ OLS ) = (X 0 X) X 0 в„¦X (X 0 X)
Writing this out from the deп¬Ѓnitions of the stacked matrices, with X 0 X =T x0 x,
в€’1
Л† = (x0 x) x0 ET (yt )
ОІ OLS
1 0 в€’1 0 в€’1
Л† (x x) (x ОЈx) (x0 x) .
cov(ОІ OLS ) =
T
We can estimate this sampling variance with
ВЎ Вў
Л†
Л†
ОЈ = ET Л†tЛ†0 ; Л†t в‰Ў yt в€’ xОІ OLS
Оµ Оµt Оµ

Pure cross-section: The pure cross-sectional estimator runs one cross-sectional regression
of the time-series averages. So, take those averages,
ET (yt ) = xОІ + ET (Оµt )
where x = ET (x ) since x is constant. Having assumed i.i.d. errors over time, the error
covariance matrix is
1
E (ET (Оµt ) ET (Оµ0 )) = ОЈ.
t
T
The cross sectional estimate and corrected standard errors are then
в€’1
Л† = (x0 x) x0 ET (yt )
ОІ XS
1 0 в€’1 0 в€’1 0 в€’1
Л†
Пѓ 2 (ОІ XS ) = (x x) x ОЈx (x x)
T
232
SECTION 12.3 FAMA-MACBETH PROCEDURE

Thus, the cross-sectional and pooled OLS estimates and standard errors are exactly the same,
in each sample.
Fama-MacBeth: The FamaвЂ“MacBeth estimator is formed by п¬Ѓrst running the cross-
sectional regression at each moment in time,
в€’1
Л†
ОІ t = (x0 x) x0 yt .

Then the estimate is the average of the cross-sectional regression estimates,
ВіВґ
Л† F M = ET ОІ t = (x0 x)в€’1 x0 ET (yt ) .
Л†
ОІ

Thus, the Fama-MacBeth estimator is also the same as the OLS estimator, in each sample.
Л†
The Fama-MacBeth standard error is based on the time-series standard deviation of the ОІ t .
Using covT to denote sample covariance,
Ві Вґ ВіВґ
Л† F M = 1 covT ОІ t = 1 (x0 x)в€’1 x0 covT (yt ) x (x0 x)в€’1 .
Л†
cov ОІ
T T
with

yt = xОІ F M + Л†t
Оµ

we have
Л†
covT (yt ) = ET (Л†tЛ†0 ) = ОЈ
Оµ Оµt

and п¬Ѓnally
Ві Вґ
Л† F M = 1 (x0 x)в€’1 x0 ОЈx (x0 x)в€’1 .
Л†
cov ОІ
T
Thus, the FM estimator of the standard error is also numerically equivalent to the OLS cor-
rected standard error.
Varying x If the xit vary through time, none of the three procedures are equal anymore,
since the cross-sectional regressions ignore time-series variation in the xit . As an extreme
example, suppose a scalar xit varies over time but not cross-sectionally,

yit = О± + xt ОІ + Оµit ; i = 1, 2, ...N ; t = 1, 2, ...T.

The grand OLS regression is
P P
P
Лњ 1 i yit
t xt N
xt yit
Лњ
Л† it
=P 2= P2
ОІ OLS
it xt
Лњ t xt
Лњ

where x = x в€’ ET (x) denotes the demeaned variables. The estimate is driven by the covari-
Лњ
ance over time of xt with the cross-sectional average of the yit , which is sensible because all
of the information in the sample lies in time variation. It is identical to a regression over time

233
CHAPTER 12 REGRESSION-BASED TESTS OF LINEAR FACTOR MODELS

of cross-sectional averages. However, you canвЂ™t even run a cross-sectional estimate, since
the right hand variable is constant across i. As a practical example, you might be interested
in a CAPM speciп¬Ѓcation in which the betas vary over time (ОІ t ) but not across test assets.
This sample still contains information about the CAPM: the time-variation in betas should
be matched by time variation in expected returns. But any method based on cross-sectional
ВҐ
regressions will completely miss it.
In historical context, the Fama MacBeth procedure was also important because it allowed
changing betas, which a single cross-sectional regression or a time-series regression test can-
not easily handle.

12.4 Problems

1. When we express the CAPM in excess return form, can the test assets be differences
between risky assets, Ri в€’Rj ? Can the market excess return also use a risky asset, or must
ВЎ Вў
it be relative to a risk free rate? (Hint: start with E(Ri ) в€’ Rf = ОІ i,m E(Rm ) в€’ Rf
and see if you can get to the other forms. Betas must be regression coefп¬Ѓcients.)
2. Can you run the GRS test on a model that uses industrial production growth as a factor,
E(Ri ) в€’ Rf = ОІ i,в€†ip О»ip ?
3. Fama and French (1997b) report that pricing errors are correlated with betas in a test of a
factor pricing model on industry portfolios. How is this possible?
4. We saw that a GLS cross-sectional regression of the CAPM passes through the market
and riskfree rate by construction. Show that if the market return is an equally weighted
portfolio of the test assets, then an OLS cross-sectional regression with an estimated
intercept passes through the market return by construction. Does it also pass through the
riskfree rate or origin?

234
Chapter 13. GMM for linear factor
models in discount factor form

13.1 GMM on the pricing errors gives a cross-sectional regression

The п¬Ѓrst stage estimate is an OLS cross-sectional regression, and the second stage is a
GLS regression,

Л†1 = (d0 d)в€’1 d0 ET (p)
First stage : b
Л†2 = (d0 S в€’1 d)d0 S в€’1 E(p).
Second stage : b

Standard errors are the corresponding regression formulas, and the variance of the pricing
errors are the standard regression formula for variance of a residual.

Treating the constant a Г— 1 as a constant factor, the model is

m = b0 f
E(p) = E(mx).

or simply

(189)
E(p) = E(xf 0 )b.

Keep in mind that p and x are N Г— 1 vectors of asset prices and payoffs respectively; f
is a K Г— 1 vector of factors, and b is a K Г— 1 vector of parameters. I suppress the time
indices mt+1 , ft+1 , xt+1, pt . The payoffs are typically returns or excess returns, including
returns scaled by instruments. The prices are typically one (returns) zero (excess returns) or
instruments.
To implement GMM, we need to choose a set of moments. The obvious set of moments
to use are the pricing errors,

gT (b) = ET (xf 0 b в€’ p).

This choice is natural but not necessary. You donвЂ™t have to use p = E(mx) with GMM, and
you donвЂ™t have to use GMM with p = E(mx). You can (we will) use GMM on expected
return-beta models, and you can use maximum likelihood on p = E(mx). It is a choice, and
the results will depend on this choice of moments as well as the speciп¬Ѓcation of the model.

235
CHAPTER 13 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

The GMM estimate is formed from

min gT (b)0 W gT (b)
b

with п¬Ѓrst order condition

d0 W gT (b) = d0 W ET (xf 0 b в€’ p) = 0

where
0
в€‚gT (b)
d0 = = ET (f x0 ).
в€‚b
This is the second moment matrix of payoffs and factors. The п¬Ѓrst stage has W = I, the
second stage has W = S в€’1 . Since this is a linear model, we can solve analytically for the
GMM estimate, and it is

Л†1 = (d0 d)в€’1 d0 ET (p)
First stage : b
Л†2 = (d0 S в€’1 d)d0 S в€’1 ET (p).
Second stage : b

The п¬Ѓrst stage estimate is an OLS cross-sectional regression of average prices on the
second moment of payoff with factors, and the second stage estimate is a GLS cross-sectional
regression. What could be more sensible? The model (13.189) says that average prices should
be a linear function of the second moment of payoff with factors, so the estimate runs a linear
regression. These are cross-sectional regressions since they operate across assets on sample
averages. The вЂњdata pointsвЂќ in the regression are sample average prices (y) and second
moments of payoffs with factors (x) across test assets. We are picking the parameter b to
make the model п¬Ѓt explain the cross-section of asset prices as well as possible.
We п¬Ѓnd the distribution theory from the standard GMM standard error formulas (11.144)
and (11.150). In the п¬Ѓrst stage, a = d0 .

1
cov(Л†1 ) = (d0 d)в€’1 d0 Sd(d0 d)в€’1
First stage : (13.190)
b
T
1
cov(Л†2 ) = (d0 S в€’1 d)в€’1 .
Second stage : b
T
Unsurprisingly, these are exactly the formulas for OLS and GLS regression errors with error
covariance S. The pricing errors are correlated across assets, since the payoffs are correlated.
Therefore the OLS cross-sectional regression standard errors need to be corrected for correla-
tion, as they are in (13.190) and one can pursue an efп¬Ѓcient estimate as in GLS. The analogy
is GLS is close, since S is the covariance matrix of E(p) в€’ E(xf 0 )b; S is the covariance
matrix of the вЂњerrorsвЂќ in the cross-sectional regression.

236
SECTION 13.2 THE CASE OF EXCESS RETURNS

The covariance matrix of the pricing errors is, from (11.147), (11.151) and (11.152)
h iВЎ ВўВЎ Вў
First stage : T cov gT (Л† = I в€’ d(d0 d)в€’1 d0 S I в€’ d(d0 d)в€’1 d0 (13.191)
b)
h i
Л† = S в€’ d(d0 S в€’1 d)в€’1 d0 .
Second stage : T cov gT (b)

These are obvious analogues to the standard regression formulas for the covariance matrix of
regression residuals.
The model test

gT (b)0 cov(gT )в€’1 gT (b) в€ј П‡2 (#moments в€’ #parameters)

which specializes for the second-stage estimate as

T gT (Л† 0 S в€’1 gT (Л† в€ј П‡2 (#moments в€’ #parameters).
b) b)

There is not much point in writing these out, other than to point out that the test is a quadratic
form in the vector of pricing errors. It turns out that the П‡2 test has the same value for п¬Ѓrst
and second stage for this model, even though the parameter estimates, pricing errors and
covariance matrix are not the same.

13.2 The case of excess returns

When mt+1 = a в€’ b0 ft+1 and the test assets are excess returns, the GMM estimate is
a GLS cross-sectional regression of average returns on the second moments of returns with
factors,
Л†1 = (d0 d)в€’1 d0 ET (Re )
First stage : b
Л†2 = (d0 S в€’1 d)d0 S в€’1 ET (Re ).
Second stage : b

where d is the covariance matrix between returns and factors. The other formulas are the
same.

The analysis of the last section requires that at least one asset has a nonzero price. If all
assets are excess returns then Л†1 = (d0 d)в€’1 d0 ET (p) = 0. Linear factor models are most often
b
applied to excess returns, so this case is important. The trouble is that in this case the mean
discount factor is not identiп¬Ѓed. If E(mRe ) = 0 then E((2 Г— m)Re ) = 0. Analogously in
expected return-beta models, if all test assets are excess returns, then we have no information
on the level of the zero-beta rate.
Writing out the model as m = aв€’b0 f , we cannot separately identify a and b so we have to
choose some normalization. The choice is entirely one of convenience; lack of identiп¬Ѓcation

237
CHAPTER 13 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

means precisely that the pricing errors do not depend on the choice of normalization.
The easiest choice is a = 1. Then

gT (b) = ET (mRe ) = ET (Re ) в€’ E(Re f 0 )b.

We have
в€‚gT (b)
d0 = = E(fRe0 ),
0
в€‚b
the second moment matrix of returns and factors. The п¬Ѓrst order condition to min gT W gT
0

is

d0 W [d b + ET (Re )] = 0.

Then, the GMM estimates of b are
Л†1 = (d0 d)в€’1 d0 ET (Re )
First stage : b
Л†2 = (d0 S в€’1 d)d0 S в€’1 ET (Re ).
Second stage : b

The GMM estimate is a cross-sectional regression of mean excess returns on the second
moments of returns with factors. From here on in, the distribution theory is unchanged from
the last section.
Mean returns on covariances
We can obtain a cross-sectional regression of mean excess returns on covariances, which
are just a heartbeat away from betas, by choosing the normalization a = 1 + b0 E(f) rather
than a = 1. Then, the model is m = 1 в€’ b0 (f в€’ E(f )) with mean E(m) = 1. The pricing
errors are
Лњ
gT (b) = ET (mRe ) = ET (Re ) в€’ ET (Re f 0 )b

Лњ
where I denote f в‰Ў f в€’ E(f). We have

в€‚gT (b) Лњ
d0 = = ET (fRe0 ),
в€‚b0
which now denotes the covariance matrix of returns and factors. The п¬Ѓrst order condition to
min gT W gT is now
0

d0 W [d b + ET (Re )] = 0.

Then, the GMM estimates of b are
Л†1 = (d0 d)в€’1 d0 ET (Re )
First stage : b
Л†2 = (d0 S в€’1 d)d0 S в€’1 ET (Re ).
Second stage : b

238
SECTION 13.3 HORSE RACES

The GMM estimate is a cross-sectional regression of expected excess returns on the covari-
ance between returns and factors. Naturally, the model says that expected excess returns
should be proportional to the covariance between returns and factors, and the estimate es-
timates that relation by a linear regression. The standard errors and variance of the pricing
errors are the same as in (13.190) and (13.191), with d now representing the covariance ma-
trix. The formulas are almost exactly identical to those of the cross-sectional regressions in
section 12.2. The p = E(mx) formulation of the model for excess returns is equivalent to
E(Re ) = в€’Cov(Re , f 0 )b; thus covariances enter in place of betas ОІ.
There is one п¬‚y in the ointment; the mean of the factor E(f ) is estimated, and the dis-
tribution theory should recognize sampling variation induced by this fact, as we did for the
fact that betas are generated regressors in the cross-sectional regressions of section 2.3. The
distribution theory is straightforward, and a problem at the end of the chapter guides you
through it. However, I think it is better to avoid the complication and just use the second mo-
ment approach, or some other non-sample dependent normalization for a. The pricing errors
are identical вЂ“ the whole point is that the normalization of a does not matter to the pricing
errors. Therefore, the П‡2 statistics are also identical. As you change the normalization for
a, you change the estimate of b. Therefore, the only effect is to add a term in the sampling
variance of the estimated parameter b.

13.3 Horse Races

How to test whether one set of factors drives out another. Test b2 = 0 in m = b0 f1 + b0 f2
1 2
Л†2 , or the П‡2 difference test.
using the standard error of b

ItвЂ™s often interesting to test whether one set of factors drives out another. For example,
Chen Roll and Ross (1986) test whether their п¬Ѓve macroeconomic factors price assets so well
that one can ignore even the market return. Given the large number of factors that have been
proposed, a statistical procedure for testing which factors survive in the presence of the others
is desirable.
In this framework, such a test is very easy. Start by estimating a general model

m = b0 f1 + b0 f2 . (192)
1 2

We want to know, given factors f1 , do we need the f2 to price assets вЂ“ i.e. is b2 = 0? There
are two ways to do this.
First and most obviously, we have an asymptotic covariance matrix for [b1 b2 ], so we can
form a t test (if b2 is scalar) or П‡2 test for b2 = 0 by forming the statistic
Л†0 var(Л†2 )в€’1Л†2 в€ј П‡2
b2 b b #b2

239
CHAPTER 13 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

where #b2 is the number of elements in the b2 vector. This is a Wald test..
Second, we can estimate a restricted system m = b0 f1 . Since there are fewer free param-
1
eters and the same number of moments than in (13.192), we expect the criterion JT to rise.
If we use the same weighting matrix, (usually the one estimated from the unrestricted model
(13.192)) then the JT cannot in fact decline. But if b2 really is zero, it shouldnвЂ™t rise вЂњmuch.вЂќ
How much? The П‡2 difference test answers that question;
T JT (restricted) в€’ T JT (unrestricted) в€ј П‡2 (#of restrictions)
This is very much like a likelihood ratio test.

13.4 Testing for characteristics

How to check whether an asset pricing model drives out a characteristic such as size,
book/market or volatility. Run cross sectional regressions of pricing errors on characteristics;
use the formulas for covariance matrix of the pricing errors to create standard errors.

ItвЂ™s often interesting to characterize a model by checking whether the model drives out
a characteristic. For example, portfolios organized by size or market capitalization show a
wide dispersion in average returns (at least up to 1979). Small stocks gave higher average
returns than large stocks. The size of the portfolio is a characteristic. A good asset pricing
model should account for average returns by betas. ItвЂ™s ok if a characteristic is associated
with average returns, but in the end betas should drive out the characteristic; the alphas or
pricing errors should not be associated with the characteristic. The original tests of the CAPM
similarly checked whether the variance of the individual portfolio had anything to do with
average returns once betas were included.
Denote the characteristic of portfolio i by yi . An obvious idea is to include both betas and
the characteristic in a multiple, cross-sectional regression,
E(Rei ) = (О±0 ) + ОІ 0 О» + Оіyi + Оµi ; i = 1, 2, ...N
i

Alternatively, subtract ОІО» from both sides and consider a cross-sectional regression of alphas
on the characteristic,
О±i = (О±0 ) + Оіyi + Оµi ; i = 1, 2, ...N.
(The difference is whether you allow the presence of the size characteristic to affect the О»
estimate or not.)
We can always run such a regression, but we donвЂ™t want to use the OLS formulas for the
sampling error of the estimates, since the errors Оµi are correlated across assets. Under the
null that Оі = 0, Оµ = О±, so we can simply use the covariance matrix of the alphas to generate

240
SECTION 13.5 TESTING FOR PRICED FACTORS: LAMBDAS OR BвЂ™S?

standard errors of the Оі. Let X denote the vector of characteristics, then the estimate is

Оі = (X 0 X)в€’1 X 0 О±
Л† Л†

with standard error

Пѓ(Л†) = (X 0 X)в€’1 X 0 cov(Л† )X(X 0 X)в€’1
Оі О±

At this point, simply use the formula for cov(Л† ) or cov(gT ) as appropriate for the model that
О±
you tested.
Sometimes, the characteristic is also estimated rather than being a п¬Ѓxed number such
as the size rank of a size portfolio, and youвЂ™d like to include the sampling uncertainty of
its estimation in the standard errors of Оі . Let yt denote the time series whose mean E(yt )
i i
Л†
determines the characteristic. Now, write the moment condition for the ith asset as
i
gT = ET (mt+1 (b)xt+1 в€’ pt в€’ Оіyt ).

The estimate of Оі tells you how the characteristic E(yi ) is associated with model pricing
errors E(mt+1 (b)xt+1 в€’ pt ). The GMM estimate of Оі is

E(y)0 W (E(mx) в€’ p в€’ Оіy)

в€’1
0 0
Оі = (ET (y)W ET (y))
Л† ET (y)W gT

a OLS or GLS regression of the pricing errors on the estimated characteristics. The stan-
dard GMM formulas for the standard deviation of Оі or the П‡2 difference test for Оі = 0 tell
you whether the Оі estimate is statistically signiп¬Ѓcant, including the fact that E(y) must be
estimated.

13.5 Testing for priced factors: lambdas or bвЂ™s?

bj asks whether factor j helps to price assets given the other factors. bj gives the multiple
regression coefп¬Ѓcient of m on fj given the other factors.
О»j asks whether factor j is priced, or whether its factor-mimicking portfolio carries a
positive risk premium. О»j gives the single regression coefп¬Ѓcient of m on fj .
Therefore, when factors are correlated, one should test bj = 0 to see whether to include
factor j given the other factors rather than test О»j = 0.
Expected return-beta models deп¬Ѓned with single regression betas give rise to О» with mul-
tiple regression interpretation that one can use to test factor pricing.

241
CHAPTER 13 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

In the context of expected return-beta models, it has been more traditional to evaluate the
relative strengths of models by testing the factor risk premia О» of additional factors, rather
than test whether their b is zero. (The bвЂ™s are not the same as the ОІвЂ™s. b are the regression
coefп¬Ѓcient of m on f, ОІ are the regression coefп¬Ѓcients of Ri on f.)
To keep the equations simple, IвЂ™ll use mean-zero factors, excess returns, and normalize to
E(m) = 1, since the mean of m is not identiп¬Ѓed with excess returns.
The parameters b and О» are related by
О» = E(f f 0 )b.
See section 6.3. Brieп¬‚y,
0 = E(mRe ) = E [Re (1 в€’ f 0 b)]
E(Re ) = cov(Re , f 0 )b = cov(Re , f 0 )E(f f 0 )в€’1 E(f f 0 )b = ОІ 0 О».
Thus, when the factors are orthogonal, E(f f 0 ) is diagonal, and each О»j = 0 if and only if
the corresponding bj = 0. The distinction between b and О» only matters when the factors are
correlated. Factors are often correlated however.
О»j captures whether factor fj is priced. We can write О» = E [f (f 0 b)] = в€’E(mf) to see
that О» is (the negative of) the price that the discount factor m assigns to f . b captures whether
factor fj is marginally useful in pricing assets, given the presence of other factors. If bj = 0,
we can price assets just as well without factor fj as with it.
О»j is proportional to the single regression coefп¬Ѓcient of m on f. О»j = cov(m, fj ).
О»j = 0 asks the corresponding single regression coefп¬Ѓcient questionвЂ”вЂњis factor j correlated
with the true discount factor?вЂќ
bj is the multiple regression coefп¬Ѓcient of m on fj given all the other factors. This just
follows from m = b0 f. (Regressions donвЂ™t have to have error terms!) A multiple regression
coefп¬Ѓcient ОІ j in y = xОІ + Оµ is the way to answer вЂњdoes xj help to explain variation in y
given the presence of the other xвЂ™s?вЂќ When you want to ask the question, вЂњshould I include
factor j given the other factors?вЂќ you want to ask the multiple regression question.
For example, suppose the CAPM is true, which is the single factor model
m = a в€’ bRem
where Rem is the market excess return. Consider any other excess return Rex , positively
correlated with Rem (x for extra). If we try a factor model with the spurious factor Rex , the
m = a в€’ bRem + 0 Г— Rex .
bx is obviously zero, indicating that adding this factor does not help to price assets.
However, since the correlation of Rex with Rem is positive, the beta of Rex on Rem is
positive, Rex earns a positive expected excess return, and О»x = E(Rex ) > 0. In the expected

242
SECTION 13.5 TESTING FOR PRICED FACTORS: LAMBDAS OR BвЂ™S?

return - beta model

E(Rei ) = ОІ im О»m + ОІ ix О»x

О»m = E(Rem ) is unchanged by the addition of the spurious factor. However, since the fac-
tors Rem , Rex are correlated, the multiple regression betas of Rei on the factors change when
we add the extra factor Rex . If ОІ ix is positive, ОІ im will decline from its single-regression
value, so the new model explains the same expected return E(Rei ). The expected return -
beta model will indicate a risk premium for ОІ x exposure, and many assets will have ОІ x expo-
sure (Rx for example!) even though factor Rx is spurious. In particular, Rex will of course
have multiple regression coefп¬Ѓcients ОІ x,m = 0 and ОІ x,x = 1, and its expected return will be
entirely explained by the new factor x.
So, as usual, the answer depends on the question. If you want to know whether factor i
is priced, look at О» (or E(mf i )). If you want to know whether factor i helps to price other
assets, look at bi . This is not an issue about sampling error or testing. All moments above are
population values.
Of course, testing b = 0 is particularly easy in the GMM, p = E(mx) setup. But you can
always test the same ideas in any expression of the model. In an expected return-beta model,
estimate b by E(ff 0 )в€’1 О» and test the elements of that vector rather than О» itself.
You can write an asset pricing model as ERe = ОІ 0 О» and use the О» to test whether each
factor can be dropped in the presence of the others, if you use single regression betas rather
than multiple regression betas. In this case each О» is proportional to the corresponding b.
Problem 2 at the end of this chapter helps you to work out this case.

13.5.1 Mean-variance frontier and performance evaluation

A GMM, p = E(mx) approach to testing whether a return expands the mean-variance
frontier. Just test whether m = a + bR prices all returns. If there is no risk free rate, use two
values of a.

We often summarize asset return data by mean-variance frontiers. For example, a large
literature has examined the desirability of international diversiп¬Ѓcation in a mean-variance
context. Stock returns from many countries are not perfectly correlated, so it looks like one
can reduce portfolio variance a great deal for the same mean return by holding an internation-
ally diversiп¬Ѓed portfolio. But is this real or just sampling error? Even if the value-weighted
portfolio were ex-ante mean-variance efп¬Ѓcient, an ex-post mean-variance frontier constructed
from historical returns on the roughly NYSE stocks would leave the value-weighted portfo-
lio well inside the ex-post frontier. So is вЂњI should have bought Japanese stocks in 1960вЂќ (and
sold them in 1990!) a signal that broad-based international diversiп¬Ѓcation a good idea now,
or is it simply 20/20 hindsight regret like вЂњI should have bought Microsoft in 1982?вЂќ Sim-

243
CHAPTER 13 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM

Frontiers intersect
E(R)

1/E(m)

Пѓ(R)

Figure 27. Mean variance frontiers might intersect rather than coincide.

ilarly, when evaluating fund managers, we want to know whether the manager is truly able
to form a portfolio that beats mean-variance efп¬Ѓcient passive portfolios, or whether better
performance in sample is just due to luck.
Since a factor model is true if and only if a linear combination of the factors (or factor-
mimicking portfolios if the factors are not returns) is mean-variance efп¬Ѓcient, one can inter-
pret a test of any factor pricing model as a test whether a given return is on the mean-variance
frontier. Section 12.1 showed how the Gibbons Ross and Shanken pricing error statistic can
be interpreted as a test whether a given portfolio is on the mean-variance frontier, when re-
turns and factors are i.i.d., and the GMM distribution theory of that test statistic allows us to
extend the test to non-i.i.d. errors. A GMM, p = E(mx), m = a в€’ bRp test analogously
tests whether Rp is on the mean-variance frontier of the test assets.
We may want to go one step further, and not just test whether a combination of a set of
assets Rd (say, domestic assets) is on the mean-variance frontier, but whether the Rd assets
span the mean-variance frontier of Rd and Ri (say, foreign or international) assets. The
trouble is, that if there is no riskfree rate, the frontier generated by Rd might just intersect the
frontier generated by Rd and Ri together, rather than span or coincide with the latter frontier,
as shown in Figure 27. Testing that m = a в€’ b0 Rd prices both Rd and Ri only checks for
intersection.

244
SECTION 13.6 PROBLEMS

DeSantis (1992) and Chen and Knez (1992,1993) show how to test for spanning as op-
posed to intersection. For intersection, m = a в€’ b0 Rd will price both Rd and Rf only for
d
one value of a, or equivalently E(m) or choice of the intercept, as shown. If the frontiers co-
incide or span, then m = a + b0 Rd prices both Rd and Rf for any value of a. Thus,we can
d
test for coincident frontiers by testing whether m = a + b0 Rd prices both Rd and Rf for
d
two prespeciп¬Ѓed values of a simultaneously.
To see how this work, start by noting that there must be at least two assets in Rd . If not,
there is no mean-variance frontier of Rd assets; it is simply a point. If there are two assets in
Rd ,Rd1 and Rd2 , then the mean-variance frontier of domestic assets connects them; they are
each on the frontier. If they are both on the frontier, then there must be discount factors

m1 = a1 в€’ Лњ1 Rd1
b

and

m2 = a2 в€’ Лњ2 Rd2
b

and, of course, any linear combination,
В¤h i
ВЈ
m = О»a1 + (1 в€’ О»)a2 в€’ О»Лњ1 Rd1 + (1 в€’ О»)Лњ2 Rd2 .
b b

Equivalently, for any value of a, there is a discount factor of the form
ВЎ Вў
m = a в€’ b1 Rd1 + b2 Rd2 .

Thus, you can test for spanning with a JT test on the moments
ВЈ В¤
E (a1 в€’ b10 Rd )Rd = 0
ВЈ В¤
E (a1 в€’ b10 Rd )Ri = 0
ВЈ В¤
E (a2 в€’ b20 Rd )Rd = 0
ВЈ В¤
E (a2 в€’ b20 Rd )Ri = 0

for any two п¬Ѓxed values of a1 , a2 .

13.6 Problems

1. Work out the GMM distribution theory for the model m = 1 в€’ b0 (f в€’ E(f )) and
test assets are excess returns. The distribution should recognize the fact that E(f ) is
estimated in sample. To do this, set up
В· Вё
ET (Re в€’ Re (f 0 в€’ Ef 0 ) b)
gT =
ET (f в€’ Ef )

245
CHAPTER 13 GMM FOR LINEAR FACTOR MODELS IN DISCOUNT FACTOR FORM
" Ві Вґ #
Лњ e0
ET fR 0
aT = .
0 IK
The estimated parameters are b, E(f). You should end up with a formula for the standard
error of b that resembles the Shanken correction (12.184), and an unchanged JT test.
2. Show that if you use single regression betas, then the corresponding О» can be used to test
for the marginal importance of factors. However, the О» are no longer the expected return
of factor mimicking portfolios.

246
Chapter 14. Maximum likelihood
Maximum likelihood is, like GMM, a general organizing principle that is a useful place to
start when thinking about how to choose parameters and evaluate a model. It comes with
an asymptotic distribution theory, which, like GMM, is a good place to start when you are
unsure about how to treat various problems such as the fact that betas must be estimated in a
cross-sectional regression.
As we will see, maximum likelihood is a special case of GMM. Given a statistical descrip-
tion of the data, it prescribes which moments are statistically most informative. Given those
moments, ML and GMM are the same. Thus, ML can be used to defend why one picks a cer-
tain set of moments, or for advice on which moments to pick if one is unsure. In this sense,
maximum likelihood (paired with carefully chosen statistical models) justiп¬Ѓes the regression
tests above, as it justiп¬Ѓes standard regressions. On the other hand, ML does not easily allow
you to use other non-вЂњefп¬ЃcientвЂќ moments, if you suspect that MLвЂ™s choices are not robust to
misspeciп¬Ѓcations of the economic or statistical model. For example, ML will tell you how
to do GLS, but it will not tell you how to adjust OLS standard errors for non-standard error
terms.
Hamilton (1994) p.142-148 and the appendix in Campbell Lo MacKinlay (1997) give
nice summaries of maximum likelihood theory. Campbell Lo and MacKinlayвЂ™s Chapter 5
and 6 treat many more variations of regression based tests and maximum likelihood.

14.1 Maximum likelihood

The maximum likelihood principle says to pick the parameters that make the observed
data most likely. Maximum likelihood estimates are asymptotically efп¬Ѓcient. The informa-
tion matrix gives the asymptotic standard errors of ML estimates.

The maximum likelihood principle says to pick that set of parameters that makes the
observed data most likely. This is not вЂњthe set of parameters that are most likely given the
dataвЂќ вЂ“ in classical (as opposed to Bayesian) statistics, parameters are numbers, not random
variables.
To implement this idea, you п¬Ѓrst have to п¬Ѓgure out what the probability of seeing a data
set {xt } is, given the free parameters Оё of a model. This probability distribution is called the
likelihood function f({xt } ; Оё). Then, the maximum likelihood principle says to pick

Л† = arg max f ({xt } ; Оё).
Оё
{Оё}

For reasons that will soon be obvious, itвЂ™s much easier to work with the log of this probability

247
CHAPTER 14 MAXIMUM LIKELIHOOD

distribution

L({xt } ; Оё) = ln f ({xt } ; Оё),

Maximizing the log likelihood is the same thing as maximizing the likelihood.
Finding the likelihood function isnвЂ™t always easy. In a time-series context, the best way
to do it is often to п¬Ѓrst п¬Ѓnd the log conditional likelihood function f (xt |xtв€’1 , xtв€’2 , ...x0 ; Оё),
the chance of seeing xt+1 given xt , xtв€’1 , ... and given values for the parameters, . Since joint
probability is the product of conditional probabilities, the log likelihood function is just the
sum of the conditional log likelihood functions,

T
X
(193)
L({xt } ; Оё) = ln f(xt |xtв€’1 , xtв€’2 ...x0 ; Оё).
t=1

More concretely, we usually assume normal errors, so the likelihood function is

T
1 X 0 в€’1
T
(194)
L=в€’ ln (2ПЂ |ОЈ|) в€’ Оµ ОЈ Оµt
2 t=1 t
2

where Оµt denotes a vector of shocks; Оµt = xt в€’ E(xt |xtв€’1 , xtв€’2 ...x0 ; Оё).
This expression gives a simple recipe for constructing a likelihood function. You usually
start with a model that generates xt from errors, e.g. xt = ПЃxtв€’1 + Оµt . Invert that model to
express the errors Оµt in terms of the data {xt } and plug in to (14.194).
There is a small issue about how to start off a model such as (14.193). Ideally, the п¬Ѓrst
observation should be the unconditional density, i.e.

L({xt } ; Оё) = ln f (x1 ; Оё) + ln f (x2 |x1 ; Оё) + ln f (x3 |x2 , x1 ; Оё)...

However, it is usually hard to evaluate the unconditional density or the п¬Ѓrst terms with only
a few lagged xs. Therefore, if as usual the conditional density can be expressed in terms
of a п¬Ѓnite number k of lags of xt , one often maximizes the conditional likelihood function
(conditional on the п¬Ѓrst k observations), treating the п¬Ѓrst k observations as п¬Ѓxed rather than
random variables.

L({xt } ; Оё) = ln f (xk+1 |xk , xkв€’1 ...x1 ; Оё) + ln f(xk+2 |xk , xkв€’1... x2 ; Оё) + ...

Alternatively, one can treat k pre-sample values {x0 , xв€’1 , ...xв€’k+1 } as additional parameters
over which to maximize the likelihood function.
Maximum likelihood estimators come with a useful asymptotic (i.e. approximate) distri-

248
SECTION 14.2 ML IS GMM ON THE SCORES

bution theory. First, the distribution of the estimates is
ГѓВ· Вёв€’1 !
в€‚ 2L
Л† (195)
Оёв€јN Оё, в€’
в€‚Оёв€‚Оё0

If the likelihood L has a sharp peak at Л† then we know a lot about the parameters, while if
Оё,
the peak is п¬‚at, other parameters are just as plausible. The maximum likelihood estimator
is asymptotically efп¬Ѓcient meaning that no other estimator can produce a smaller covariance
matrix.
The second derivative in (14.195) is known as the information matrix,

T
1 X в€‚ 2 ln f (xt+1 |xt , xtв€’1 , ...x0 ; Оё)
1 в€‚ 2L
(196)
I=в€’ =в€’ .
T в€‚Оёв€‚Оё0 в€‚Оёв€‚Оё0
T t=1

(More precisely, the information matrix is deп¬Ѓned as the expected value of the second partial,
which is estimated with the sample value.) The information matrix can also be estimated as
a product of п¬Ѓrst derivatives. The expression

TВµ В¶Вµ В¶0
1 X в€‚ ln f (xt+1 |xt , xtв€’1 , ...x0 ; Оё) в€‚ ln f(xt+1 |xt , xtв€’1 , ...x0 ; Оё)
I =в€’ .
T t=1 в€‚Оё в€‚Оё

converges to the same value as (14.196). (Hamilton 1994 p.429 gives a proof.)
If we estimate a model restricting the parameters, the maximum value of the likelihood
function will necessarily be lower. However, if the restriction is true, it shouldnвЂ™t be that
much lower. This intuition is captured in the likelihood ratio test

(197)
2(Lunrestricted в€’ Lrestricted )в€јП‡2
number of restrictions

The form and idea of this test is much like the П‡2 difference test for GMM objectives that we
met in section 11.1.

14.2 ML is GMM on the scores

ML is a special case of GMM. ML uses the information in the auxiliary statistical model to
derive statistically most informative moment conditions. To see this fact, start with the п¬Ѓrst
order conditions for maximizing a likelihood function

T
в€‚L({xt } ; Оё) X в€‚ ln f (xt |xtв€’1 xtв€’2 ...; Оё)
(198)
= = 0.
в€‚Оё в€‚Оё
t=1

249
CHAPTER 14 MAXIMUM LIKELIHOOD

This is a GMM estimate. It is the sample counterpart to a population moment condition
Вµ В¶
в€‚ ln f (xt |xtв€’1 xtв€’2 ...; Оё)
(199)
g(Оё) = E = 0.
в€‚Оё

The term в€‚ ln f (xt |xtв€’1 xtв€’2 ...; Оё)/в€‚Оё is known as the вЂњscore.вЂќ It is a random variable,
formed as a combination of current and past data (xt , xtв€’1 ...). Thus, maximum likelihood is
a special case of GMM, a special choice of which moments to examine.
For example, suppose that x follows an AR(1) with known variance,

xt = ПЃxtв€’1 + Оµt ,

and suppose the error terms are i.i.d. normal random variables. Then,

(xt в€’ ПЃxtв€’1 )2
Оµ2
t
ln f (xt |xtв€’1 , xtв€’2 ...; ПЃ) = const. в€’ 2 = const в€’
2Пѓ 2
2Пѓ

and the score is

в€‚ ln f (xt |xtв€’1 xtв€’2 ...; ПЃ) (xt в€’ ПЃxtв€’1 ) xtв€’1
= .
Пѓ2
в€‚ПЃ

The п¬Ѓrst order condition for maximizing likelihood is

T
1X
(xt в€’ ПЃxtв€’1 ) xtв€’1 = 0.
T t=1

This expression is a moment condition, and youвЂ™ll recognize it as the OLS estimator of ПЃ,
which we have already regarded as a case of GMM.
The example shows another property of scores: The scores should be unforecastable. In
the example,
В· Вё hОµ x i
(xt в€’ ПЃxtв€’1 ) xtв€’1 t tв€’1
(200)
Etв€’1 = Etв€’1 = 0.
2 Пѓ2
Пѓ

Intuitively, if we used a combination of the x variables E(h(xt , xtв€’1 , ...)) = 0 that was
predictable, we could form another moment вЂ“ an instrument вЂ“ that described the predictability
of the h variable and use that moment to get more information about the parameters. To prove
this property more generally, start with the fact that f (xt |xtв€’1 , xtв€’2 , ...; Оё) is a conditional

250
SECTION 14.3 WHEN FACTORS ARE RETURNS, ML PRESCRIBES A TIME-SERIES REGRESSION

density and therefore must integrate to one,
Z
1= f (xt |xtв€’1 , xtв€’2 , ...; Оё)dxt
Z
в€‚f (xt |xtв€’1 , xtв€’2 , ...; Оё)
0= dxt
в€‚Оё
Z
в€‚ ln f (xt |xtв€’1 , xtв€’2 , ...; Оё)
0= f (xt |xtв€’1 , xtв€’2 , ...; Оё)dxt
в€‚Оё
В· Вё
в€‚ ln f (xt |xtв€’1 , xtв€’2 , ...; Оё)
0 = Etв€’1 .
в€‚Оё

Furthermore, as you might expect, the GMM distribution theory formulas give the same
result as the ML distribution, i.e., the information matrix is the asymptotic variance-covariance
matrix. To show this fact, apply the GMM distribution theory (11.144) to (14.198). The
derivative matrix is
T
1 X в€‚ 2 ln f (xt |xtв€’1 xtв€’2 ...; Оё)
в€‚gT (Оё)
d= = =I
в€‚Оё0
0
T t=1 в€‚Оёв€‚Оё

This is the second derivative expression of the information matrix. The S matrix is
В· Вё
в€‚ ln f (xt |xtв€’1 xtв€’2 ...; Оё) в€‚ ln f (xt |xtв€’1 xtв€’2 ...; Оё) 0
E =I
в€‚Оё в€‚Оё
The lead and lag terms in S are all zero since we showed above that scores should be un-
forecastable. This is the outer product deп¬Ѓnition of the information matrix. There is no a
matrix, since the moments themselves are set to zero. The GMM asymptotic distribution of
Л† is therefore
Оё
в€љ ВЈ В¤ ВЈ В¤
T (Л† в€’ Оё) в†’ N 0, dв€’1 Sdв€’10 = N 0, I в€’1 .
Оё
We recover the inverse information matrix, as speciп¬Ѓed by the ML asymptotic distribution
theory.

14.3 When factors are returns, ML prescribes a time-series
regression

I add to the economic model E (Re ) = ОІE(f) a statistical assumption that the regression
errors are independent over time and independent of the factors. ML then prescribes a time-
series regression with no constant. To prescribe a time series regression with a constant, we
drop the model prediction О± = 0. I show how the information matrix gives the same result
as the OLS standard errors.

251
CHAPTER 14 MAXIMUM LIKELIHOOD

Given a linear factor model whose factors are also returns, as with the CAPM, ML pre-
scribes a time-series regression test. To keep notation simple, I again treat a single factor f.
The economic model is

(201)
E (Re ) = ОІE(f )

Re is an N Г— 1 vector of test assets, and ОІ is an N Г— 1 vector of regression coefп¬Ѓcients of
these assets on the factor (the market return Rem in the case of the CAPM).
To apply maximum likelihood, we need to add an explicit statistical model that fully
describes the joint distribution of the data. I assume that the market return and regression
errors are i.i.d. normal, i.e.

(14.202)
Re = О± + ОІft + Оµt
t
ft = E(f ) + ut
В· Вё ВµВ· Вё В· ВёВ¶
Оµt 0 ОЈ0
в€јN ,
0 Пѓ2
ut 0 u

(We can get by with non-normal factors, but it is easier not to present the general case.)
Equation (14.202) has no content other than normality. The zero correlation between ut and
Оµt identiп¬Ѓes ОІ as a regression coefп¬Ѓcient. You can just write Re , Rem as a general bivariate
normal, and you will get the same results.
The economic model (14.201) implies restrictions on this statistical model. Taking ex-
pectations of (14.202), the CAPM implies that the intercepts О± should all be zero. Again, this
is also the only restriction that the CAPM places on the statistical model (14.202).
The most principled way to apply maximum likelihood is to impose the null hypothesis
throughout. Thus, we write the likelihood function imposing О± = 0. To construct the likeli-
hood function, we reduce the statistical model to independent error terms, and then add their
log probability densities to get the likelihood function.
T T
1X e 1 X (ft в€’ E(f ))2
0 в€’1 e
L = (const.) в€’ (R в€’ ОІft ) ОЈ (Rt в€’ ОІft ) в€’
2 t=1 t Пѓ2
2 t=1 u

The estimates follow from the п¬Ѓrst order conditions,
ГѓT !в€’1
T T
X X X
в€‚L Л†
в€’1
(Re ft2 Re ft
=ОЈ в€’ ОІft ) ft = 0 в‡’ ОІ =
t t
в€‚ОІ t=1 t=1 t=1
T T
1X X
в€‚L [ =О»= 1
Л†
= (ft в€’ E(f)) = 0 в‡’ E(f) ft
Пѓ 2 t=1
в€‚E(f ) T t=1
u

(в€‚L/в€‚ОЈ and в€‚L/в€‚Пѓ2 also produce ML estimates of the covariance matrices, which turn out

252
SECTION 14.3 WHEN FACTORS ARE RETURNS, ML PRESCRIBES A TIME-SERIES REGRESSION

to be the standard averages of squared residuals.)
The ML estimate of ОІ is the OLS regression without a constant. The null hypothesis says
to leave out the constant, and the ML estimator uses that fact to avoid estimating a constant.
Since the factor risk premium is equal to the market return, itвЂ™s not too surprising that the О»
estimate is the same as that of the average market return.
We know that the ML distribution theory must give the same result as the GMM distribu-
tion theory which we already derived in section 12.1, but itвЂ™s worth seeing it explicitly. The
asymptotic standard errors follow from either estimate of the information matrix, for example
T
X
в€‚ 2L в€’1
ft2 = 0
0 = в€’ОЈ
в€‚ОІв€‚ОІ t=1

Thus,
11 1 1
Л† (203)
cov(ОІ) = ОЈ= ОЈ.
T E(f 2 ) T E(f )2 + Пѓ2 (f )

This is the standard OLS formula.
We also want pricing error measurements, standard errors and tests. We can apply maxi-
mum likelihood to estimate an unconstrained model, containing intercepts, and then use Wald
tests (estimate/standard error) to test the restriction that the intercepts are zero. We can also
use the unconstrained model to run the likelihood ratio test. The unconstrained likelihood
function is
T
1X e
(Rt в€’ О± в€’ ОІft )0 ОЈв€’1 (Rt в€’ О± в€’ ОІft ) + ...
e
L = (const.) в€’
2 t=1

(I ignore the term in the factor, since it will again just tell us to use the sample mean to
The estimates are now
T
X
в€‚L Л†
в€’1
(Re в€’ О± в€’ ОІft ) = 0 в‡’ О± = ET (Rt ) в€’ ОІET (ft )
e
=ОЈ Л†
t
в€‚О± t=1
T
X e
в€‚L Л† = covT (Rt , ft )
в€’1
(Re
=ОЈ в€’ О± в€’ ОІft ) ft = 0 в‡’ ОІ
t
Пѓ2 (ft )
в€‚ОІ T
t=1

Unsurprisingly, the maximum likelihood estimates of О± and ОІ are the OLS estimates, with a
constant.
The inverse of the information matrix gives the asymptotic distribution of these estimates.
Since they are just OLS estimates, weвЂ™re going to get the OLS standard errors, but itвЂ™s worth

253
CHAPTER 14 MAXIMUM LIKELIHOOD

seeing it come out of ML.
пЈ® пЈ№в€’1
В· Вёв€’1
пЈЇ пЈє
в€‚ 2L ОЈв€’1 ОЈв€’1 E(f )
в€’пЈЇ В· пЈє
Вё =
пЈ° В¤пЈ» ОЈв€’1 E(f ) ОЈв€’1 E(f 2 )
ВЈ
О±
в€‚О±ОІ
в€‚
ОІ
В· Вё
E(f 2 ) E(f )
1
= вЉ—ОЈ
E(f ) 1
Пѓ2 (f)
Л†
The covariance matrices of О± and ОІ are thus
Л†
" В¶2 #
Вµ
1 E(f )
cov(Л† ) =
О± 1+ ОЈ
T Пѓ(f )
11
Л† (14.204)
cov(ОІ) = ОЈ.
T Пѓ 2 (f )
These are just the usual OLS standard errors, which we derived in section 12.1 as a special
case of GMM standard errors for the OLS time-series regressions when errors are uncorre-
lated over time and independent of the factors, or by specializing Пѓ2 (X 0 X)в€’1 .
You cannot just invert в€‚ 2 L/в€‚О±в€‚О±0 to п¬Ѓnd the covariance of О±. That attempt would give
Л†
just ОЈ as the covariance matrix of О±, which would be wrong. You have to invert the entire
Л†
information matrix to get the standard error of any parameter. Otherwise, you are ignoring
the effect that estimating ОІ has on the distribution of О±. In fact, what I presented is really
Л†
Л†
wrong, since we also must estimate ОЈ. However, it turns out that ОЈ is independent of О± andЛ†
Л† вЂ“ the information matrix is block-diagonal вЂ“ so the top left two elements of the true inverse
ОІ
information matrix are the same as I have written here.
Л†
The variance of ОІ in (14.204) is larger than it is in (14.203) was when we impose the null
of no constant. ML uses all the information it can to produce efп¬Ѓcient estimates вЂ“ estimates
with the smallest possible covariance matrix. The ratio of the two formulas is equal to the
familiar term 1 + E(f )2 /Пѓ2 (f ). In annual data for the CAPM, Пѓ(Rem ) = 16%, E(Rem ) =
8%, means that unrestricted estimate (14.204) has a variance 25% larger than the restricted
estimate (14.203), so the gain in efп¬Ѓciency can be important. In monthly data, however the
gain is smaller since variance and mean both scale with the horizon.
We can also view this fact as a warning: ML will ruthlessly exploit the null hypothesis and
do things like running regressions without a constant in order to get any small improvement
in efп¬Ѓciency.
We can use these covariance matrices to construct a Wald (estimate/standard error) test
the restriction of the model that the alphas are all zero,
Гѓ В¶2 !в€’1
Вµ
E(f)
О±0 ОЈв€’1 О±в€јП‡2 . (205)
T 1+ Л† Л†N
Пѓ(f)

254
SECTION 14.4 WHEN FACTORS ARE NOT EXCESS RETURNS, ML PRESCRIBES A CROSS-SECTIONAL REGRESSION
 << стр. 9(всего 17)СОДЕРЖАНИЕ >>