<< стр. 7(всего 17)СОДЕРЖАНИЕ >>
10.1 The Recipe

Deп¬Ѓnitions

ut+1 (b) в‰Ў mt+1 (b)xt+1 в€’ pt
gT (b) в‰Ў ET [ut (b)]
в€ћ
X
E [ut (b) utв€’j (b)0 ]
Sв‰Ў
j=в€’в€ћ

GMM estimate

Л†2 = argminb gT (b)0 S в€’1 gT (b).
Л†
b

Standard errors
1 в€‚gT (b)
var(Л†2 ) = (d0 S в€’1 d)в€’1 ; d в‰Ў
b
T в€‚b

177
CHAPTER 10 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

Test of the model (вЂњoveridentifying restrictionsвЂќ)
ВЈ В¤
T JT = T min gT (b)0 S в€’1 gT (b) в€ј П‡2 (#moments в€’ #parameters).

ItвЂ™s easiest to start our discussion of GMM in the context of an explicit discount factor
model, such as the consumption-based model. I treat the special structure of linear factor
models later. I start with the basic classic recipe as given by Hansen and Singleton (1982).
Discount factor models involve some unknown parameters as well as data, so I write
mt+1 (b) when itвЂ™s important to remind ourselves of this dependence. For example, if mt+1 =
ОІ(ct+1 /ct )в€’Оі , then b в‰Ў [ОІ Оі]0 . I write Л† to denote an estimate when it is important to
b
distinguish estimated from other values.
Any asset pricing model implies

(136)
E(pt ) = E [mt+1 (b)xt+1 ] .

ItвЂ™s easiest to write this equation in the form E(В·) = 0

(137)
E [mt+1 (b)xt+1 в€’ pt ] = 0.

x and p are typically vectors; we typically check whether a model for m can price a number
of assets simultaneously. Equations (10.137) are often called the moment conditions.
ItвЂ™s convenient to deп¬Ѓne the errors ut (b) as the object whose mean should be zero,

ut+1 (b) = mt+1 (b)xt+1 в€’ pt

Given values for the parameters b, we could construct a time series on ut and look at its mean.
Deп¬Ѓne gT (b) as the sample mean of the ut errors, when the parameter vector is b in a
sample of size T :
T
1X
gT (b) в‰Ў ut (b) = ET [ut (b)] = ET [mt+1 (b)xt+1 в€’ pt ] .
T t=1

The second equality introduces the handy notation ET for sample means,
T
1X
ET (В·) = (В·).
T t=1

Л†
(It might make more sense to denote these estimates E and g . However, HansenвЂ™s T subscript
Л†
notation is so widespread that doing so would cause more confusion than it solves.)

178
SECTION 10.1 THE RECIPE

The п¬Ѓrst stage estimate of b minimizes a quadratic form of the sample mean of the errors,

Л†1 = argmin Л† gT (Л† 0 W gT (Л†
b b) b)
{b}

for some arbitrary matrix W (often, W = I). This estimate is consistent and asymptotically
normal. You can and often should stop here, as I explain below.
Using Л†1 , form an estimate S of
Л†
b
в€ћ
X
E [ut (b) utв€’j (b)0 ] . (138)
Sв‰Ў
j=в€’в€ћ

(Below I discuss various interpretations of and ways to construct this estimate.) Form a
second stage estimate Л†2 using the matrix S in the quadratic form,
Л†
b

Л†2 = argmin gT (b)0 S в€’1 gT (b).
Л†
b b

Л†2 is a consistent, asymptotically normal, and asymptotically efп¬Ѓcient estimate of the param-
b
eter vector b. вЂњEfп¬ЃcientвЂќ means that it has the smallest variance-covariance matrix among all
estimators that set different linear combinations of gT (b) to zero.
The variance-covariance matrix of Л†2 is
b
1
var(Л†2 ) = (d0 S в€’1 d)в€’1
b
T
where
в€‚gT (b)
dв‰Ў
в€‚b
or, more explicitly,
Вµ В¶ВЇ
ВЇ
в€‚
[(mt+1 (b)xt+1 в€’ pt )] ВЇ
d = ET ВЇЛ†
в€‚b b=b

(More precisely, d should be written as the object to which в€‚gT /в€‚b converges, and then
в€‚gT /в€‚b is an estimate of that object used to form a consistent estimate of the asymptotic
variance-covariance matrix.)
This variance-covariance matrix can be used to test whether a parameter or group of
parameters are equal to zero, via

Л†b
qi в€ј N (0, 1)
Л† ii
var(b)

179
CHAPTER 10 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

and
h iв€’1
Л†j var(Л† jj Л†j в€ј П‡2 (#included b0 s)
b b) b

where bj =subvector, var(b)jj =submatrix.
Finally, the test of overidentifying restrictions is a test of the overall п¬Ѓt of the model. It
states that T times the minimized value of the second-stage objective is distributed П‡2 with
degrees of freedom equal to the number of moments less the number of estimated parameters.
ВЈ В¤
T JT = T min gT (b)0 S в€’1 gT (b) в€ј П‡2 (#moments в€’ #parameters).
{b}

10.2 Interpreting the GMM procedure

gT (b) is a pricing error. It is proportional to О±.
GMM picks parameters to minimize a weighted sum of squared pricing errors.
The second-stage picks the linear combination of pricing errors that are best measured, by
having smallest sampling variation. First and second stage are like OLS and GLS regressions.
The standard error formula is a simple application of the delta method.
The JT test evaluates the model by looking at the sum of squared pricing errors.

Pricing errors

The moment conditions are

gT (b) = ET [mt+1 (b)xt+1 ] в€’ ET [pt ] .

Thus, each moment is the difference between actual (ET (p)) and predicted (ET (mx)) price,
or pricing error. What could be more natural than to pick parameters so that the modelвЂ™s
predicted prices are as close as possible to the actual prices, and then to evaluate the model
by how large these pricing errors are?
In the language of expected returns, the moments gT (b) are proportional to the difference
between actual and predicted returns; JensenвЂ™s alphas, or the vertical distance between the
points and the line in Figure 5. To see this fact, recall that 0 = E(mRe ) can be translated to
a predicted expected return,
cov(m, Re )
E(Re ) = в€’ .
E(m)

180
SECTION 10.2 INTERPRETING THE GMM PROCEDURE

Therefore, we can write the pricing error as
Вµ Вµ В¶В¶
cov(m, Re )
e e
g(b) = E(mR ) = E(m) E(R ) в€’ в€’
E(m)
1
(actual mean return - predicted mean return.)
g(b) =
Rf

If we express the model in expected return-beta language,

E(Rei ) = О±i + ОІ 0 О»
i

then the GMM objective is proportional to the JensenвЂ™s alpha measure of mis-pricing,
1
g(b) = О±i .
Rf

First-stage estimates

If we could, weвЂ™d pick b to make every element of gT (b) = 0 вЂ” to have the model price
assets perfectly in sample. However, there are usually more moment conditions (returns times
instruments) than there are parameters. There should be, because theories with as many free
parameters as facts (moments) are vacuous. Thus, we choose b to make gT (b) as small as
possible, by minimizing a quadratic form,

min gT (b)0 W gT (b). (139)
{b}

W is a weighting matrix that tells us how much attention to pay to each moment, or how
to trade off doing well in pricing one asset or linear combination of assets vs. doing well in
pricing another. In the common case W = I, GMM treats all assets symmetrically, and the
objective is to minimize the sum of squared pricing errors.
The sample pricing error gT (b) may be a nonlinear function of b. Thus, you may have
to use a numerical search to п¬Ѓnd the value of b that minimizes the objective in (10.139).
However, since the objective is locally quadratic, the search is usually straightforward.

Second-stage estimates: Why S в€’1 ?

What weighting matrix should you use? The weighting matrix directs GMM to emphasize
some moments or linear combinations of moments at the expense of others. You might start
with W = I, i.e., try to price all assets equally well. A W that is not the identity matrix can
be used to offset differences in units between the moments. You also might also start with
different elements on the diagonal of W if you think some assets are more interesting, more
informative, or better measured than others.
The second-stage estimate picks a weighting matrix based on statistical considerations.

181
CHAPTER 10 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

Some asset returns may have much more variance than other assets. For those assets, the
sample mean gT = ET (mt Rt в€’ 1) will be a much less accurate measurement of the popula-
tion mean E(mR в€’ 1), since the sample mean will vary more from sample to sample. Hence,
it seems like a good idea to pay less attention to pricing errors from assets with high variance
of mt Rt в€’ 1. One could implement this idea by using a W matrix composed of inverse vari-
ances of ET (mt Rt в€’ 1) on the diagonal. More generally, since asset returns are correlated,
one might think of using the covariance matrix of ET (mt Rt в€’1). This weighting matrix pays
most attention to linear combinations of moments about which the data set at hand has the
most information. This idea is exactly the same as heteroskedasticity and cross-correlation
corrections that lead you from OLS to GLS in linear regressions.
The covariance matrix of gT = ET (ut+1 ) is the variance of a sample mean. Exploiting
the assumption that E(ut ) = 0, and that ut is stationary so E(u1 u2 ) = E(ut ut+1 ) depends
only on the time interval between the two us, we have
Гѓ !
T
1X
var(gT ) = var ut+1
T t=1
1ВЈ ВЎ Вў В¤
T E(ut u0 ) + (T в€’ 1) E(ut u0 ) + E(ut u0 )) + ...
= t tв€’1 t+1
T2

As T в†’ в€ћ, (T в€’ j)/T в†’ 1, so

в€ћ
1X 1
E(ut u0 ) = S.
var(gT ) в†’ tв€’j
T j=в€’в€ћ T

The last equality denotes S, known for other reasons as the spectral density matrix at fre-
quency zero of ut . (Precisely, S so deп¬Ѓned is the variance-covariance matrix of the gT for
п¬Ѓxed b. The actual variance-covariance matrix of gT must take into account the fact that we
chose b to set a linear combination of the gT to zero in each sample. I give that formula
below. The point here is heuristic.)
This fact suggests that a good weighting matrix might be the inverse of S. In fact, Hansen
(1982) shows formally that the choice

в€ћ
X
в€’1
E(ut u0 )
W =S , Sв‰Ў tв€’j
j=в€’в€ћ

is the statistically optimal weighing matrix, meaning that it produces estimates with lowest
asymptotic variance.
в€љ
You may be more used to the formula Пѓ(u)/ T for the standard deviation of a sample
mean. This formula is a special case that holds when the u0 s are uncorrelated over time. If
t

182
SECTION 10.2 INTERPRETING THE GMM PROCEDURE

Et (ut u0 ) = 0, j 6= 0, then the previous equation reduces to
tв€’j
Гѓ !
T
1X 1 var(u)
E(uu0 ) =
var ut+1 = .
T t=1 T T

This is probably the п¬Ѓrst statistical formula you ever saw вЂ“ the variance of the sample mean.
In GMM, it is the last statistical formula youвЂ™ll ever see as well. GMM amounts to just gen-
eralizing the simple ideas behind the distribution of the sample mean to parameter estimation
and general statistical contexts.
The п¬Ѓrst and second stage estimates should remind you of standard linear regression mod-
els. You start with an OLS regression. If the errors are not i.i.d., the OLS estimates are con-
sistent, but not efп¬Ѓcient. If you want efп¬Ѓcient estimates, you can use the OLS estimates to
obtain a series of residuals, estimate a variance-covariance matrix of residuals, and then do
GLS. GLS is also consistent and more efп¬Ѓcient, meaning that the sampling variation in the
estimated parameters is lower.

Standard errors

The formula for the standard error of the estimate,
1
var(Л†2 ) = (d0 S в€’1 d)в€’1 (140)
b
T
can be understood most simply as an instance of the вЂњdelta methodвЂќ that the asymptotic
variance of f (x) is f 0 (x)2 var(x). Suppose there is only one parameter and one moment.
S/T is the variance matrix of the moment gT . dв€’1 is [в€‚gT /в€‚b]в€’1 = в€‚b/в€‚gT . Then the delta
method formula gives
1 в€‚b в€‚b
var(Л†2 ) =
b var(gT ) .
T в€‚gT в€‚gT
The actual formula (10.140) just generalizes this idea to vectors.

10.2.1 JT Test

Once youвЂ™ve estimated the parameters that make a model вЂњп¬Ѓt best,вЂќ the natural question is,
how well does it п¬Ѓt? ItвЂ™s natural to look at the pricing errors and see if they are вЂњbig.вЂќ The
JT test asks whether they are вЂњbigвЂќ by statistical standards вЂ“ if the model is true, how often
should we see a (weighted) sum of squared pricing errors this big? If not often, the model is
вЂњrejected.вЂќ The test is
h i
T JT = T gT (Л† 0 S в€’1 gT (Л† в€ј П‡2 (#moments в€’ #parameters).
b) b)

Since S is the variance-covariance matrix of gT , this statistic is the minimized pricing errors

183
CHAPTER 10 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

divided by their variance-covariance matrix. Sample means converge to a normal distribution,
so sample means squared divided by variance converges to the square of a normal, or П‡2 .
The reduction in degrees of freedom corrects for the fact that S is really the covariance
matrix of gT for п¬Ѓxed b. We set a linear combination of the gT to zero in each sample, so the
actual covariance matrix of gT is singular, with rank #moments - #parameters. More details
below.

10.3 Applying GMM

Notation.
Forecast errors and instruments.
Stationarity and choice of units.

Notation; instruments and returns

Most of the effort involved with GMM is simply mapping a given problem into the very
general notation. The equation

E [mt+1 (b)xt+1 в€’ pt ] = 0

can capture a lot. We often test asset pricing models using returns, in which case the moment
conditions are

E [mt+1 (b)Rt+1 в€’ 1] = 0.

It is common to add instruments as well. Mechanically, you can multiply both sides of

1 = Et [mt+1 (b)Rt+1 ]

by any variable zt observed at time t before taking unconditional expectations, resulting in

E(zt ) = E [mt+1 (b)Rt+1 zt ] .

Expressing the result in E(В·) = 0 form,

(141)
0 = E {[mt+1 (b)Rt+1 в€’ 1] zt } .

We can do this for a whole vector of returns and instruments, multiplying each return by each
instrument. For example, if we start with two returns R = [Ra Rb ]0 and one instrument z,

184
SECTION 10.3 APPLYING GMM

equation (10.141) looks like
пЈ±пЈ® пЈ№пЈј пЈ®
пЈ№ пЈ® пЈ№
пЈґ mt+1 (b) Ra 1пЈґ 0
пЈґ пЈґ
t+1
пЈІпЈЇ пЈє пЈЇ 1 пЈєпЈЅ пЈЇ 0пЈє
m (b) Rb
E пЈЇ t+1 пЈєв€’пЈЇ пЈє пЈЇ пЈє.
t+1
пЈ» пЈ° zt пЈ»пЈґ = пЈ°
пЈґпЈ° mt+1 (b) Ra zt 0пЈ»
пЈґ пЈґ
t+1
пЈі пЈѕ
mt+1 (b) Rb zt zt 0
t+1

Using the Kronecker product вЉ— meaning вЂњmultiply every element by every other elementвЂќ
we can denote the same relation compactly by

(142)
E {[mt+1 (b) Rt+1 в€’ 1] вЉ— zt } = 0,

or, emphasizing the managed-portfolio interpretation and p = E(mx) notation,

E [mt+1 (b)(Rt+1 вЉ— zt ) в€’ (1 вЉ— zt )] = 0.

Forecast errors and instruments

The asset pricing model says that, although expected returns can vary across time and as-
sets, expected discounted returns should always be the same, 1. The error ut+1 = mt+1 Rt+1 в€’
1 is the ex-post discounted return. ut+1 = mt+1 Rt+1 в€’ 1 represents a forecast error. Like
any forecast error, ut+1 should be conditionally and unconditionally mean zero.
In an econometric context, z is an instrument because it is uncorrelated with the error
ut+1 . E(zt ut+1 ) is the numerator of a regression coefп¬Ѓcient of ut+1 on zt ; thus adding
instruments basically checks that the ex-post discounted return is unforecastable by linear
regressions.
If an assetвЂ™s return is higher than predicted when zt is unusually high, but not on average,
scaling by zt will pick up this feature of the data. Then, the moment condition checks that
the discount rate is unusually low at such times, or that the conditional covariance of the
discount rate and asset return moves sufп¬Ѓciently to justify the high conditionally expected
return. As I explained in Section 8.1, the addition of instruments is equivalent to adding the
returns of managed portfolios to the analysis, and is in principle able to capture all of the
modelвЂ™s predictions.

Stationarity and distributions

The GMM distribution theory does require some statistical assumption. Hansen (1982)
and Ogaki (1993) cover them in depth. The most important assumption is that m, p, and
x must be stationary random variables. (вЂњStationaryвЂќ of often misused to mean constant, or
i.i.d.. The statistical deп¬Ѓnition of stationarity is that the joint distribution of xt , xtв€’j depends
only on j and not on t.) Sample averages must converge to population means as the sample
size grows, and stationarity implies this result.
Assuring stationarity usually amounts to a choice of sensible units. For example, though

185
CHAPTER 10 GMM IN EXPLICIT DISCOUNT FACTOR MODELS

we could express the pricing of a stock as

pt = Et [mt+1 (dt+1 + pt+1 )]

it would not be wise to do so. For stocks, p and d rise over time and so are typically not
stationary; their unconditional means are not deп¬Ѓned. It is better to divide by pt and express
the model as
В· Вё
dt+1 + pt+1
1 = Et mt+1 = Et (mt+1 Rt+1 )
pt
The stock return is plausibly stationary.
Dividing by dividends is an alternative and I think underutilized way to achieve stationar-
ity (at least for portfolios, since many individual stocks do not pay regular dividends):
В· Вµ В¶ Вё
pt pt+1 dt+1
= Et mt+1 1 + .
dt dt+1 dt
Ві Вґ
pt+1 dt+1
Now we map 1 + dt+1 dt into xt+1 and pt into pt . This formulation allows us to focus
dt
on prices rather than one-period returns.
Bonds are a claim to a dollar, so bond prices and yields do not grow over time. Hence, it
might be all right to examine

pb = E(mt+1 1)
t

with no transformations.
Stationarity is not always a clear-cut question in practice. As variables become вЂњless
stationary,вЂќ as they experience longer swings in a sample, the asymptotic distribution can
becomes a less reliable guide to a п¬Ѓnite-sample distribution. For example, the level of nominal
interest rates is surely a stationary variable in a fundamental sense: it was 6% in ancient
Babylon, about 6% in 14th century Italy, and about 6% again today. Yet it takes very long
swings away from this unconditional mean, moving slowly up or down for even 20 years at
a time. Therefore, in an estimate and test that uses the level of interest rates, the asymptotic
distribution theory might be a bad approximation to the correct п¬Ѓnite sample distribution
theory. This is true even if the number of data points is large. 10,000 data points measured
every minute are a вЂњsmallerвЂќ data set than 100 data points measured every year. In such
a case, it is particularly important to develop a п¬Ѓnite-sample distribution by simulation or
bootstrap, which is easy to do given todayвЂ™s computing power.
It is also important to choose test assets in a way that is stationary. For example, individual
stocks change character over time, increasing or decreasing size, exposure to risk factors,
leverage, and even nature of the business. For this reason, it is common to sort stocks into
portfolios based on characteristics such as betas, size, book/market ratios, industry and so
forth. The statistical characteristics of the portfolio returns may be much more constant than

186
SECTION 10.3 APPLYING GMM

the characteristics of individual securities, which п¬‚oat in and out of the various portfolios.
(One can alternatively include the characteristics as instruments.)
Many econometric techniques require assumptions about distributions. As you can see,
the variance formulas used in GMM do not include the usual assumptions that variables
are i.i.d., normally distributed, homoskedastic, etc. You can put such assumptions in if you
want to вЂ“ weвЂ™ll see how below, and adding such assumptions simpliп¬Ѓes the formulas and can
improve the small-sample performance when the assumptions are justiп¬Ѓed вЂ“ but you donвЂ™t

187
Chapter 11. GMM: general formulas
and applications
Lots of calculations beyond formal parameter estimation and overall model testing are useful
in the process of evaluating a model and comparing it to other models. But you want to
understand sampling variation in such calculations, and mapping the questions into the GMM
framework allows you to do this easily. In addition, alternative estimation and evaluation
procedures may be more intuitive or robust to model misspeciп¬Ѓcation than the two (or multi)
stage procedure described above.
In this chapter I lay out the general GMM framework, and I discuss four applications and
variations on the basic GMM method. 1) I show how to derive standard errors of nonlin-
ear functions of sample moments, such as correlation coefп¬Ѓcients. 2) I apply GMM to OLS
regressions, easily deriving standard error formulas that correct for autocorrelation and con-
ditional heteroskedasticity. 3) I show how to use prespeciп¬Ѓed weighting matrices W in asset
pricing tests in order to overcome the tendency of efп¬Ѓcient GMM to focus on spuriously low-
variance portfolios 4) As a good parable for prespeciп¬Ѓed linear combination of moments a, I
show how to mimic вЂњcalibrationвЂќ and вЂњevaluationвЂќ phases of real business cycle models. 5)
I show how to use the distribution theory for the gT beyond just forming the JT test in order
to evaluate the importance of individual pricing errors. The next chapter continues, and col-
lects GMM variations useful for evaluating linear factor models and related mean-variance
frontier questions.
Many of these calculations amount to creative choices of the aT matrix that selects which
linear combination of moments are set to zero, and reading off the resulting formulas for
variance covariance matrix of the estimated coefп¬Ѓcients, equation (11.146) and variance co-
variance matrix of the moments gT , equation (11.147).

11.1 General GMM formulas

The general GMM estimate

aT gT (Л† = 0
b)

Distribution of Л† :
b

b)

Distribution of gT (Л† :
b)
h iВЎ ВўВЎ Вў0
T cov gT (Л† = I в€’ d(ad)в€’1 a S I в€’ d(ad)в€’1 a
b)

188
SECTION 11.1 GENERAL GMM FORMULAS

The вЂњoptimalвЂќ estimate uses a = d0 S в€’1 . In this case,

T cov(Л† = (d0 S в€’1 d)в€’1
b)

h i
Л† = S в€’ d(d0 S в€’1 d)в€’1 d0
T cov gT (b)

and

T JT = T gT (Л† 0 S в€’1 gT (Л† в†’ П‡2 (#moments в€’ #parameters).
b) b)

An analogue to the likelihood ratio test,

T JT (restricted) в€’ T JT (unrestricted) в€ј П‡2
Number of restrictions

GMM procedures can be used to implement a host of estimation and testing exercises.
Just about anything you might want to estimate can be written as a special case of GMM. To
do so, you just have to remember (or look up) a few very general formulas, and then map
Express a model as

E[f (xt , b)] = 0

Everything is a vector: f can represent a vector of L sample moments, xt can be M data
series, b can be N parameters. f(xt , b) is a slightly more explicit statement of the errors
ut (b) in the last chapter
Deп¬Ѓnition of the GMM estimate.
We estimate parameters Л† to set some linear combination of sample means of f to zero,
b
Л† : set aT gT (Л† = 0 (143)
b b)

where
T
1X
gT (b) в‰Ў f(xt , b)
T t=1

and aT is a matrix that deп¬Ѓnes which linear combination of gT (b) will be set to zero. This
deп¬Ѓnes the GMM estimate.
If there are as many moments as parameters, you will set each moment to zero; when
there are fewer parameters than moments, (11.143) just captures the natural idea that you
will set some moments, or some linear combination of moments to zero in order to estimate
the parameters. The minimization of the last chapter is a special case. If you estimate b by

189
CHAPTER 11 GMM: GENERAL FORMULAS AND APPLICATIONS

min gT (b)0 W gT (b), the п¬Ѓrst order conditions are
0

0
в€‚gT
W gT (b) = 0,
в€‚b
which is of the form (11.143) with aT = в€‚gT /в€‚bW . The general GMM procedure allows
0

you to pick arbitrary linear combinations of the moments to set to zero in parameter estima-
tion.
Standard errors of the estimate.
Hansen (1982), Theorem 3.1 tells us that the asymptotic distribution of the GMM estimate
is
в€љ ВЈ В¤
b
where
В· Вё
в€‚f в€‚gT (b)
dв‰ЎE (xt , b) =
в€‚b0 в€‚b0
(i.e., d is deп¬Ѓned as the population moment in the п¬Ѓrst equality, which we estimate in sample
by the second equality), where

a в‰Ў plim aT ,
and where
в€ћ
X
E [f(xt , b), f (xtв€’j b)0 ] . (145)
Sв‰Ў
j=в€’в€ћ
в€љ
DonвЂ™t forget the T in (11.144)! In practical terms, this means to use
1
b)
T
as the covariance matrix for standard errors and tests. As in the last chapter, you can under-
stand this formula as an application of the delta method.
Distribution of the moments.
HansenвЂ™s Lemma 4.1 gives the sampling distribution of the moments gT (b) :

hВЎ Вў0 i
в€љ ВўВЎ
T gT (Л† в†’ N 0, I в€’ d(ad)в€’1 a S I в€’ d(ad)в€’1 a . (147)
b)

As we have seen, S would be the asymptotic variance-covariance matrix of sample means, if
we did not estimate any parameters, which sets some linear combinations of the gT to zero.
The I в€’ d(ad)в€’1 a terms account for the fact that in each sample some linear combinations
of gT are set to zero. Thus, this variance-covariance matrix is singular.

190
SECTION 11.1 GENERAL GMM FORMULAS

П‡2 tests.
A sum of squared standard normals is distributed П‡2 . Therefore, it is natural to use the
distribution theory for gT to see if the gT are jointly вЂњtoo big.вЂќ Equation (11.147) suggests
that we form the statistic
hВЎ Вў iв€’1
ВўВЎ
Л† 0 I в€’ d(ad)в€’1 a S I в€’ d(ad)в€’1 a 0 gT (Л† (148)
T gT (b) b)

and that it should have a П‡2 distribution. It does, but with a hitch: The variance-covariance
matrix is singular, soP have to pseudo-invert it. For example, you can perform an eigen-
you
value decomposition = QО›Q0 and then invert only the non-zero eigenvalues. Also, the П‡2
distribution has degrees of freedom given by the number non-zero linear combinations of gT ,
the number of moments less number of estimated parameters. You can similarly use (11.147)
to construct tests of individual moments (вЂњare the small stocks mispriced?вЂќ) or groups of
moments.
Efп¬Ѓcient estimates
The theory so far allows us to estimate parameters by setting any linear combination of
moments to zero. Hansen shows that one particular choice is statistically optimal,

a = d0 S в€’1 . (149)

This choice is the п¬Ѓrst order condition to min{b} gT (b)0 S в€’1 gT (b) that we studied in the last
Chapter. With this weighting matrix, the standard error formula (11.146) reduces to
в€љ ВЈ В¤
T (Л† в€’ b) в†’ N 0, (d0 S в€’1 d)в€’1 . (150)
b

This is HansenвЂ™s Theorem 3.2. The sense in which (11.149) is вЂњefп¬ЃcientвЂќ is that the sampling
variation of the parameters for arbitrary a matrix, (11.146), equals the sampling variation of
the вЂњefп¬ЃcientвЂќ estimate in (11.150) plus a positive semideп¬Ѓnite matrix.
With the optimal weights (11.149), the variance of the moments (11.147) simpliп¬Ѓes to
1ВЎ Вў
S в€’ d(d0 S в€’1 d)в€’1 d0 . (151)
cov(gT ) =
T
We can use this matrix in a test of the form (11.148). However, HansenвЂ™s Lemma 4.2 tells us
that there is an equivalent and simpler way to construct this test,

T gT (Л† 0 S в€’1 gT (Л† в†’ П‡2 (#moments в€’ #parameters). (152)
b) b)

This result is nice since we get to use the already-calculated and non-singular S в€’1 .
To derive (11.152) from (11.147), factor S = CC 0 and then п¬Ѓnd the asymptotic covari-
ance matrix of C в€’1 gT (Л† using (11.147). The result is
b)
hв€љ i
var T C в€’1 gT (Л† = I в€’ C в€’1 d(d0 S в€’1 d)в€’1 d0 C в€’10 .
b)

191
CHAPTER 11 GMM: GENERAL FORMULAS AND APPLICATIONS

This is an idempotent matrix of rank #moments-#parameters, so (11.152) follows.
Alternatively, note that S в€’1 is a pseudo-inverse of the second stage cov(gT ). (A pseudo-
inverse times cov(gT ) should result in an idempotent matrix of the same rank as cov(gT ).)
ВЎ Вў
S в€’1 cov(gT ) = S в€’1 S в€’ d(d0 S в€’1 d)в€’1 d0 = I в€’ S в€’1 d(d0 S в€’1 d)в€’1 d0

Then, check that the result is idempotent.
ВЎ ВўВЎ Вў
I в€’ S в€’1 d(d0 S в€’1 d)в€’1 d0 I в€’ S в€’1 d(d0 S в€’1 d)в€’1 d0 = I в€’ S в€’1 d(d0 S в€’1 d)в€’1 d0 .

This derivation not only veriп¬Ѓes that JT has the same distribution as gT cov(gT )в€’1 gT , but
0

that they are numerically the same in every sample.
I emphasize that (11.150) and (11.152) only apply to the вЂњoptimalвЂќ choice of weights,
(11.149). If you use another set of weights, as in a п¬Ѓrst-stage estimate, you must use the
general formulas (11.146) and (11.147).
Model comparisons
You often want to compare one model to another. If one model can be expressed as a
special or вЂњrestrictedвЂќ case of the other or вЂњunrestrictedвЂќ model we can perform a statistical
comparison that looks very much like a likelihood ratio test. If we use the same S matrix
вЂ“ usually that of the unrestricted model вЂ“ the restricted JT must rise. But if the restricted
model is really true, it shouldnвЂ™t rise вЂњmuch.вЂќ How much?

T JT (restricted) в€’ T JT (unrestricted) в€ј П‡2 (#of restrictions)

This is a вЂњП‡2 differenceвЂќ test, due to Newey and West (1987a), who call it the вЂњD-test.вЂќ

11.2 Testing moments

How to test one or a group of pricing errors. 1) Use the formula for var(gT ) 2) A П‡2
difference test.

You may want to see how well a model does on particular moments or particular pricing
errors. For example, the celebrated вЂњsmall п¬Ѓrm effectвЂќ states that an unconditional CAPM
(m = a+ bRW , no scaled factors) does badly in pricing the returns on a portfolio that always
holds the smallest 1/10th or 1/20th of п¬Ѓrms in the NYSE. You might want to see whether a
new model prices the small returns well. The standard error of pricing errors also allows
you to add error bars to a plot of predicted vs. actual mean returns such as Figure 5 or other
diagnostics based on pricing errors.
We have already seen that individual elements of gT measure the pricing errors or ex-
pected return errors. Thus, the sampling variation of gT given by (11.147) provides exactly

192
SECTION 11.3 STANDARD ERRORS OF ANYTHING BY DELTA METHOD

the standard error we are looking for. You can use the sampling distribution of gT , to evalu-
ate the signiп¬Ѓcance of individual pricing errors, to construct a t-test (for a single gT , such as
small п¬Ѓrms) or П‡2 test (for groups of gT , such as small п¬Ѓrms вЉ— instruments). As usual this is
the Wald test.
Alternatively, you can use the П‡2 difference approach. Start with a general model that in-
cludes all the moments, and form an estimate of the spectral density matrix S. Now set to
zero the moments you want to test, and denote gsT (b) the vector of moments, including the
zeros (s for вЂњsmallerвЂќ). Choose bs to minimize gsT (bs )0 S в€’1 gsT (bs ) using the same weight-
ing matrix S. The criterion will be lower than the original criterion gT (b)0 S в€’1 gT (b), since
there are the same number of parameters and fewer moments. But, if the moments we want to
test truly are zero, the criterion shouldnвЂ™t be that much lower. The П‡2 difference test applies,
T gT (Л† 0 S в€’1 gT (Л† в€’ T gsT (Л†s )S в€’1 gsT (Л†s ) в€ј П‡2 (#eliminated moments).
b) b) b b

Of course, donвЂ™t fall into the obvious trap of picking the largest of 10 pricing errors and
noting itвЂ™s more than two standard deviations from zero. The distribution of the largest of 10
pricing errors is much wider than the distribution of a single one. To use this distribution,
you have to pick which pricing error youвЂ™re going to test before you look at the data.

11.3 Standard errors of anything by delta method

One quick application illustrates the usefulness of the GMM formulas. Often, we want to
estimate a quantity that is a nonlinear function of sample means,
b = П† [E(xt )] = П†(Вµ).
In this case, the formula (11.144) reduces to
В· Вё0 X В·Вё
в€ћ
1 dП† dП†
cov(xt , x0 ) (153)
var(bT ) = .
tв€’j
T dВµ j=в€’в€ћ dВµ

The formula is very intuitive. The variance of the sample mean is the covariance term inside.
The derivatives just linearize the function П† near the true b.
For example, a correlation coefп¬Ѓcient can be written as a function of sample means as
E(xt yt ) в€’ E(xt )E(yt )
corr(xt , yt ) = p p
E(x2 ) в€’ E(xt )2 E(yt ) в€’ E(yt )2
2
t

Thus, take
ВЈ В¤0
E(xt ) E(x2 ) E(yt ) E(yt ) E(xt yt )
2
Вµ= .
t

A problem at the end of the chapter asks you to take derivatives and derive the standard error
of the correlation coefп¬Ѓcient. One can derive standard errors for impulse-response functions,

193
CHAPTER 11 GMM: GENERAL FORMULAS AND APPLICATIONS

variance decompositions, and many other statistics in this way.

11.4 Using GMM for regressions

By mapping OLS regressions in to the GMM framework, we derive formulas for OLS
standard errors that correct for autocorrelation and conditional heteroskedasticity of the er-
rors. The general formula is
пЈ® пЈ№
в€ћ
X
1
var(ОІ) = E(xt x0 )в€’1 пЈ° E(ut xt x0 utв€’j )пЈ» E(xt x0 )в€’1 .
Л†
t tв€’j t
T j=в€’в€ћ

and it simpliп¬Ѓes in special cases.

Mapping any statistical procedure into GMM makes it easy to develop an asymptotic
distribution that corrects for statistical problems such as non-normality, serial correlation and
conditional heteroskedasticity. To illustrate, as well as to develop the very useful formulas, I
map OLS regressions into GMM.
Correcting OLS standard errors for econometric problems is not the same thing as GLS.
When errors do not obey the OLS assumptions, OLS is consistent, and often more robust
than GLS, but its standard errors need to be corrected.
OLS picks parameters ОІ to minimize the variance of the residual:
ВЈ В¤
min ET (yt в€’ ОІ 0 xt )2 .
{ОІ}

Л†
We п¬Ѓnd ОІ from the п¬Ѓrst order condition, which states that the residual is orthogonal to the
right hand variable:
h i
Л† Л†
gT (ОІ) = ET xt (yt в€’ x0 ОІ) = 0 (154)
t

This condition is exactly identiп¬ЃedвЂ“the number of moments equals the number of parameters.
Thus, we set the sample moments exactly to zero and there is no weighting matrix (a = I).
We can solve for the estimate analytically,
в€’1
Л†
ОІ = [ET (xt x0 )] ET (xt yt ).
t

This is the familiar OLS formula. The rest of the ingredients to equation (11.144) are

d = E(xt x0 )
t

194
SECTION 11.4 USING GMM FOR REGRESSIONS

f (xt , ОІ) = xt (yt в€’ x0 ОІ) = xt et
t

where et is the regression residual. Equation (11.144) gives a formula for OLS standard
errors,
пЈ® пЈ№
в€ћ
X
1
var(ОІ) = E(xt x0 )в€’1 пЈ° E(ut xt x0 utв€’j )пЈ» E(xt x0 )в€’1 .
Л† (155)
t tв€’j t
T j=в€’в€ћ

This formula reduces to some interesting special cases.

Serially uncorrelated, homoskedastic errors

These are the usual OLS assumptions, and itвЂ™s good the usual formulas emerge. Formally,
the OLS assumptions are

(156)
E(et | xt , xtв€’1 ...etв€’1 , etв€’2 ...) = 0

E(e2 | xt , xtв€’1 ...et , etв€’1 ...) = constant = Пѓ2 . (157)
t e

To use these assumptions, I use the fact that

E(ab) = E(E(a|b)b).

The п¬Ѓrst assumption means that only the j = 0 term enters the sum
в€ћ
X
E(et xt x0 etв€’j ) = E(e2 xt x0 ).
tв€’j t t
j=в€’в€ћ

The second assumption means that

E(e2 xt x0 ) = E(e2 )E(xt x0 ) = Пѓ2 E(xt x0 ).
t t t t e t

Hence equation (11.155) reduces to our old friend,

12 в€’1
Л† Пѓe E(xt x0 )в€’1 = Пѓ 2 (X 0 X) .
var(ОІ) = t e
T
ВЈ В¤0
The last notation is typical of econometrics texts, in which X = x1 rep-
x2 ... xT
resents the data matrix.

Heteroskedastic errors

If we delete the conditional homoskedasticity assumption (11.157), we canвЂ™t pull the u out

195
CHAPTER 11 GMM: GENERAL FORMULAS AND APPLICATIONS

of the expectation, so the standard errors are
1
Л† E(xt x0 )в€’1 E(u2 xt x0 )E(xt x0 )в€’1 .
var(ОІ) = t t t t
T
These are known as вЂњHeteroskedasticity consistent standard errorsвЂќ or вЂњWhite standard er-
rorsвЂќ after White (1980).

Hansen-Hodrick errors

Hansen and Hodrick (1982) run forecasting regressions of (say) six month returns, using
monthly data. We can write this situation in regression notation as

yt+k = ОІ 0 xt + Оµt+k t = 1, 2, ...T.

Fama and French (1988) also use regressions of overlapping long horizon returns on variables
such as dividend/price ratio and term premium. Such regressions are an important part of the
evidence for predictability in asset returns.
Under the null that one-period returns are unforecastable, we will still see correlation in
the Оµt due to overlapping data. Unforecastable returns imply

E(Оµt Оµtв€’j ) = 0 for |j| в‰Ґ k

but not for |j| < k. Therefore, we can only rule out terms in S lower than k. Since we might
as well correct for potential heteroskedasticity while weвЂ™re at it, the standard errors are
пЈ® пЈ№
k
X
1
var(bT ) = E(xt x0 )в€’1 пЈ° E(ut xt x0 utв€’j )пЈ» E(xt x0 )в€’1 .
t tв€’j t
T
j=в€’k

11.5 Prespeciп¬Ѓed weighting matrices and moment conditions

Prespeciп¬Ѓed rather than вЂњoptimalвЂќ weighting matrices can emphasize economically inter-
esting results, they can avoid the trap of blowing up standard errors rather than improving
pricing errors, they can lead to estimates that are more robust to small model misspeciп¬Ѓ-
cations. This is analogous to the fact that OLS is often preferable to GLS in a regression
context. The GMM formulas for a п¬Ѓxed weighting matrix W are
1
var(Л† = (d0 W d)в€’1 d0 W SW d(d0 W d)в€’1
b)
T

1
(I в€’ d(d0 W d)в€’1 d0 W )S(I в€’ W d(d0 W d)в€’1 d0 ).
var(gT ) =
T

196
SECTION 11.5 PRESPECIFIED WEIGHTING MATRICES AND MOMENT CONDITIONS

In the basic approach outlined in Chapter 10, our п¬Ѓnal estimates were based on the вЂњefп¬Ѓ-
cientвЂќ S в€’1 weighting matrix. This objective maximizes the asymptotic statistical information
in the sample about a model, given the choice of moments gT . However, you may want to
use a prespeciп¬Ѓed weighting matrix W 6= S в€’1 instead, or at least as a diagnostic accompa-
nying more formal statistical tests. A prespeciп¬Ѓed weighting matrix lets you, rather than the
S matrix, specify which moments or linear combination of moments GMM will value in the
minimization min{b} gT (b)0 W gT (b). A higher value of Wii forces GMM to pay more atten-
tion to getting the ith moment right in the parameter estimation. For example, you might feel
that some assets suffer from measurement error, are small and illiquid and hence should be
deemphasized, or you may want to keep GMM from looking at portfolios with strong long
and short position. I give some additional motivations below.
You can also go one step further and impose which linear combinations aT of moment
conditions will be set to zero in estimation rather than use the choice resulting from a min-
imization, aT = d0 S в€’1 or aT = d0 W . The п¬Ѓxed W estimate still trades off the accuracy
of individual moments according to the sensitivity of each moment with respect to the pa-
ВЈ 1 2 В¤0
rameter. For example, if gT = gT gT , W = I, but в€‚gT /в€‚b = [1 10], so that the second
moment is 10 times more sensitive to the parameter value than the п¬Ѓrst moment, then GMM
with п¬Ѓxed weighting matrix sets
1 2
1 Г— gT + 10 Г— gT = 0.

The second moment condition will be 10 times closer to zero than the п¬Ѓrst. If you really want
GMM to pay equal attention to the two moments, then you can п¬Ѓx the aT matrix directly, for
example aT = [1 1] or aT = [1 в€’ 1].
Using a prespeciп¬Ѓed weighting matrix or using a prespeciп¬Ѓed set of moments is not the
same thing as ignoring correlation of the errors ut in the distribution theory. The S matrix
will still show up in all the standard errors and test statistics.

11.5.1 How to use prespeciп¬Ѓed weighting matrices

Once you have decided to use a prespeciп¬Ѓed weighting matrix W or a prespeciп¬Ѓed set of
moments aT gT (b) = 0, the general distribution theory outlined in section 11.1 quickly gives
standard errors of the estimates and moments, and therefore a П‡2 statistic that can be used
to test whether all the moments are jointly zero. Section 11.1 gives the formulas for the
case that aT is prespeciп¬Ѓed. If we use weighting matrix W , the п¬Ѓrst order conditions to
min{b} gT (b)W gT (b) are
0

в€‚gT (b)0
W gT (b) = d0 W gT (b) = 0,
в€‚b
so we map into the general case with aT = d0 W. Plugging this value into (11.146), the

197
CHAPTER 11 GMM: GENERAL FORMULAS AND APPLICATIONS

variance-covariance matrix of the estimated coefп¬Ѓcients is
1
var(Л† = (d0 W d)в€’1 d0 W SW d(d0 W d)в€’1 . (158)
b)
T
(You can check that this formula reduces to 1/T (d0 S в€’1 d)в€’1 with W = S в€’1 .)
Plugging a = d0 W into equation (11.147), we п¬Ѓnd the variance-covariance matrix of the
moments gT
1
(I в€’ d(d0 W d)в€’1 d0 W )S(I в€’ W d(d0 W d)в€’1 d0 ) (159)
var(gT ) =
T
As in the general formula, the terms to the left and right of S account for the fact that some
linear combinations of moments are set to zero in each sample.
Equation (11.159) can be the basis of П‡2 tests for the overidentifying restrictions. If we
interpret ()в€’1 to be a generalized inverse, then

gT var(gT )в€’1 gT в€ј П‡2 (#moments в€’ #parameters).
0

As in the general case, you have to pseudo-invert the singular var(gT ), for example by in-
verting only the non-zero eigenvalues.
The major danger in using prespeciп¬Ѓed weighting matrices or moments aT is that the
choice of moments, units, and (of course) the prespeciп¬Ѓed aT or W must be made carefully.
For example, if you multiply the second moment by 10 times its original value, the S matrix
will undo this transformation and weight them in their original proportions. The identity
weighting matrix will not undo such transformations, so the units should be picked right
initially.

11.5.2 Motivations for prespeciп¬Ѓed weighting matrices

Robustness, as with OLS vs. GLS.
When errors are autocorrelated or heteroskedastic, every econometrics textbook shows
you how to вЂњimproveвЂќ on OLS by making appropriate GLS corrections. If you correctly
model the error covariance matrix and if the regression is perfectly speciп¬Ѓed, the GLS pro-
cedure can improve efп¬Ѓciency, i.e. give estimates with lower asymptotic standard errors.
However, GLS is less robust. If you model the error covariance matrix incorrectly, the GLS
estimates can be much worse than OLS. Also, the GLS transformations can zero in on slightly
misspeciп¬Ѓed areas of the model, producing garbage. GLS is вЂњbest,вЂќ but OLS is вЂњpretty darn
good.вЂќ One often has enough data that wringing every last ounce of statistical precision (low
standard errors) from the data is less important than producing estimates that do not depend
on questionable statistical assumptions, and that transparently focus on the interesting fea-
tures of the data. In these cases, it is often a good idea to use OLS estimates. The OLS
standard error formulas are wrong, though, so you must correct the standard errors of the

198
SECTION 11.5 PRESPECIFIED WEIGHTING MATRICES AND MOMENT CONDITIONS

OLS estimates for these features of the error covariance matrices, using the formulas we
developed in section 11.4.
GMM works the same way. First-stage or otherwise п¬Ѓxed weighting matrix estimates may
give up something in asymptotic efп¬Ѓciency, but they are still consistent, and they can be more
robust to statistical and economic problems. You still want to use the S matrix in computing
standard errors, though, as you want to correct OLS standard errors, and the GMM formulas
show you how to do this.
Even if in the end you want to produce вЂњefп¬ЃcientвЂќ estimates and tests, it is a good idea to
calculate standard errors and model п¬Ѓt tests for the п¬Ѓrst-stage estimates. Ideally, the parameter
estimates should not change by much, and the second stage standard errors should be tighter.
If the вЂњefп¬ЃcientвЂќ parameter estimates do change a great deal, it is a good idea to diagnose
why this is so. It must come down to the вЂњefп¬ЃcientвЂќ parameter estimates strongly weighting
moments or linear combinations of moments that were not important in the п¬Ѓrst stage, and
that the former linear combination of moments disagrees strongly with the latter about which
parameters п¬Ѓt well. Then, you can decide whether the difference in results is truly due to
efп¬Ѓciency gain, or whether it signals a model misspeciп¬Ѓcation.
Chapter 16 argues more at length for judicious use of вЂњinefп¬ЃcientвЂќ methods such as OLS
to guard against inevitable model misspeciп¬Ѓcations.
Near-singular S.
The spectral density matrix is often nearly singular, since asset returns are highly corre-
lated with each other, and since we often include many assets relative to the number of data
points. As a result, second stage GMM (and, as we will see below, maximum likelihood
or any other efп¬Ѓcient technique) tries to minimize differences and differences of differences
of asset returns in order to extract statistically orthogonal components with lowest variance.
One may feel that this feature leads GMM to place a lot of weight on poorly estimated, eco-
nomically uninteresting, or otherwise non-robust aspects of the data. In particular, portfolios
of the form 100R1 в€’ 99R2 assume that investors can in fact purchase such heavily leveraged
portfolios. Short-sale costs often rule out such portfolios or signiп¬Ѓcantly alter their returns,
so one may not want to emphasize pricing them correctly in the estimation and evaluation.
For example, suppose that S is given by
В· Вё
1ПЃ
S= .
ПЃ1

so
В· Вё
1 1 в€’ПЃ
в€’1
S = .
в€’ПЃ 1
1 в€’ ПЃ2

We can factor S в€’1 into a вЂњsquare rootвЂќ by the Choleski decomposition. This produces a

199
CHAPTER 11 GMM: GENERAL FORMULAS AND APPLICATIONS

triangular matrix C such that C 0 C = S в€’1 . You can check that the matrix
" #
в€љв€’ПЃ
в€љ1
(160)
1в€’ПЃ2 1в€’ПЃ2
C=
0 1

works. Then, the GMM criterion

min gT S в€’1 gT
0

is equivalent to

min(gT C 0 )(CgT ).
0

CgT gives the linear combination of moments that efп¬Ѓcient GMM is trying to minimize.
Looking at (11.160), as ПЃ в†’ 1, the (2,2) element stays at 1, but the (1,1) and (1,2) elements
get very large and of opposite signs. For example, if ПЃ = 0.95, then
В· Вё
3.20 в€’3.04
C= .
0 1

In this example, GMM pays a little attention to the second moment, but places three times
as much weight on the difference between the п¬Ѓrst and second moments. Larger matrices
produce even more extreme weights. At a minimum, it is a good idea to look at S в€’1 and its
Choleski decomposition to see what moments GMM is prizing.
The same point has a classic interpretation, and is a well-known danger with classic
regression-based tests. Efп¬Ѓcient GMM wants to focus on well-measured moments. In as-
set pricing applications, the errors are typically close to uncorrelated over time, so GMM is
looking for portfolios with small values of var(mt+1 Re ). Roughly speaking, those will
t+1
be asset with small return variance. Thus, GMM will pay most attention to correctly pricing
the sample minimum-variance portfolio, and GMMвЂ™s evaluation of the model by JT test will
focus on its ability to price this portfolio.
Now, consider what happens in a sample, as illustrated in Figure 24. The sample mean-
variance frontier is typically a good deal wider than the true, or ex-ante mean-variance fron-
tier. In particular, the sample minimum-variance portfolio may have little to do with the
true minimum-variance portfolio. Like any portfolio on the sample frontier, its composition
largely reп¬‚ects luck вЂ“ thatвЂ™s why we have asset pricing models in the п¬Ѓrst place rather than
just price assets with portfolios on the sample frontier. The sample minimum variance return
is also likely to be composed of strong long-short positions.
In sum, you may want to force GMM not to pay quite so much attention to correctly
pricing the sample minimum variance portfolio, and you may want to give less importance to
a statistical measure of model evaluation that almost entirely prizes GMMвЂ™s ability to price
that portfolio.
Economically interesting moments.

200
SECTION 11.5 PRESPECIFIED WEIGHTING MATRICES AND MOMENT CONDITIONS

Sample minimum-variance portfolio
E(R) Sample, ex-post frontier

True, ex-ante frontier

Пѓ(R)

Figure 24. True or ex ante and sample or ex-post mean-variance frontier. The sample often
shows a spurious minimum-variance portfolio.

201
CHAPTER 11 GMM: GENERAL FORMULAS AND APPLICATIONS

The optimal weighting matrix makes GMM pay close attention to linear combinations of
moments with small sampling error in both estimation and evaluation. One may want to force
the estimation and evaluation to pay attention to economically interesting moments instead.
The initial portfolios are usually formed on an economically interesting characteristic such as
size, beta, book/market or industry. One typically wants in the end to see how well the model
prices these initial portfolios, not how well the model prices potentially strange portfolios
of those portfolios. If a model fails, one may want to characterize that failure as вЂњthe model
doesnвЂ™t price small stocksвЂќ not вЂњthe model doesnвЂ™t price a portfolio of 900Г— small п¬Ѓrm returns
в€’600Г— large п¬Ѓrm returns в€’299Г— medium п¬Ѓrm returns.вЂќ
Level playing п¬Ѓeld.
The S matrix changes as the model and as its parameters change. (See the deп¬Ѓnition,
(10.138) or (11.145).) As the S matrix changes, which assets the GMM estimate tries hard
to price well changes as well. For example, the S matrix from one model may value strongly
pricing the T bill well, while that of another model may value pricing a stock excess return
well. Comparing the results of such estimations is like comparing apples and oranges. By
п¬Ѓxing the weighting matrix, you can force GMM to pay attention to the various assets in the
same proportion while you vary the model.
The fact that S matrices change with the model leads to another subtle trap. One model
my may вЂњimproveвЂќ a JT = gT S в€’1 gT statistic because it blows up the estimates of S, rather
0

than making any progress on lowering the pricing errors gT . No one would formally use a
comparison of JT tests across models to compare them, of course. But it has proved nearly
irresistible for authors to claim success for a new model over previous ones by noting im-
proved JT statistics, despite different weighting matrices, different moments, and sometimes
much larger pricing errors. For example, if you take a model mt and create a new model by
simply adding noise, unrelated to asset returns (in sample), m0 = mt + Оµt , then the moment
t
condition gT = ET (mt Rt ) = ET ((mt + Оµt ) Rt ) is unchanged. However, the spectral den-
0e e
h i
2 e e0
sity matrix S = E (mt + Оµt ) Rt Rt can rise dramatically. This can reduce the JT leading
to a false sense of вЂњimprovement.вЂќ
Conversely, if the sample contains a nearly riskfree portfolio of the test assets, or a port-
folio with apparently small variance of mt+1 Re , then the JT test essentially evaluates the
t+1
model by how will it can price this one portfolio. This can lead to a false rejection вЂ“ even a
very small gT will produce a large gT S в€’1 gT if there is an eigenvalue of S that is (spuriously)
0

too small.
If you use a common weighting matrix W for all models, and evaluate the models by
gT W gT , then you can avoid this trap. Beware that the individual П‡2 statistics are based on
0

gT var(gT )в€’1 gT , and var(gT ) contains S, even with a prespeciп¬Ѓed weighting matrix W .
0

You should look at the pricing errors, or at some statistic such as the sum of absolute or
squared pricing errors to see if they are bigger or smaller, leaving the distribution aside. The
question вЂњare the pricing errors small?вЂќ is as interesting as the question вЂњif we drew artiп¬Ѓcial
data over and over again from a null statistical model, how often would we estimate a ratio

202
SECTION 11.5 PRESPECIFIED WEIGHTING MATRICES AND MOMENT CONDITIONS

of pricing errors to their estimated variance gT S в€’1 gT this big or larger?вЂќ
0

11.5.3 Some prespeciп¬Ѓed weighting matrices

Two examples of economically interesting weighting matrices are the second-moment matrix
of returns, advocated by Hansen and Jagannathan (1997) and the simple identity matrix,
which is used implicitly in much empirical asset pricing.
Second moment matrix.
Hansen and Jagannathan (1997) advocate the use of the second moment matrix of payoffs
W = E(xx0 )в€’1 in place of S. They motivate this weighting matrix as an interesting distance
measure between a model for m, say y, and the space of true mвЂ™s. Precisely, the minimum
distance (second moment) between a candidate discount factor y and the space of true dis-
count factors is the same as the minimum value of the GMM criterion with W = E(xx0 )в€’1
as weighting matrix.

X

m proj(y| X)

x*
y

Nearest m

Figure 25. Distance between y and nearest m = distance between proj(y|X) and xв€— .

To see why this is true, refer to Figure 25. The distance between y and the nearest valid
m is the same as the distance between proj(y | X) and xв€— . As usual, consider the case that
X is generated from a vector of payoffs x with price p. From the OLS formula,

proj(y | X) = E(yx0 )E(xx0 )в€’1 x.

203
CHAPTER 11 GMM: GENERAL FORMULAS AND APPLICATIONS

xв€— is the portfolio of x that prices x by construction,

xв€— = p0 E(xx0 )в€’1 x.

Then, the distance between y and the nearest valid m is:

ky в€’ nearest mk = kproj(y|X) в€’ xв€— k
В° В°
= В°E(yx0 )E(xx0 )в€’1 x в€’ p0 E(xx0 )в€’1 xВ°
В° В°
= В°(E(yx0 ) в€’ p0 ) E(xx0 )в€’1 xВ°
 << стр. 7(всего 17)СОДЕРЖАНИЕ >>