two measures of

covariation, not just one.

relationship is (e.g., whether knowing A gives you much con¬dence to predict G or H). Your

¬rst task is to draw lines into each of the eight graphs in Figure 13.1 that best ¬ts the points (do

it!)”and then to stare at the points and your lines. How steep are your lines (the relationships)

and how reliably do the points cluster around your line? Before reading on, your second task

is to write into the eight graphs in Figure 13.1 what your intuition tells you the relation and the

reliability are.

Figure 13.1 seems to show a range of di¬erent patterns:

Here is what I see.

• C and D are strongly related to A. Actually, C is just 2 · (A + 3); D is just ’C/2, so these

two relationships are also perfectly reliable.

• E and F have no relationship to A. Though random, there is no pattern connecting A

and E. F is 5 regardless of what A is.

• G and H both tend to be higher when A is higher, but in di¬erent ways. There is more

reliability in the relation between A and G, but even a large increase in A predicts a G

that is only a little higher. In contrast, there is less reliability in how A and H covary, but

even a small increase in A predicts a much higher H.

• Figures I and J repeat the G/H pattern, but for negative relations.

¬le=statistics-g.tex: RP

313

Section 13·4. Bivariate Statistics: Covariation Measures.

Figure 13.1. C Through J as functions of A

15

15

D

C

5

5

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

15

15

F

E

5

5

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

15

15

H

G

5

5

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A

A

15

15

5

5

J

I

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

The ¬ve observations are marked by circles. Can you draw a well ¬tting line? Which series have relations with A?

What sign? Which series have reliable relations with A?

¬le=statistics-g.tex: LP

314 Chapter 13. Statistics.

Table 13.4. Illustrative Rates of Return Time Series on Nine Assets

Portfolio (or Asset or Security)

A C D E F G H I J

Observation

’5 ’4 ’10 ’10

Year 1 +1.0 +5 2 +5 +12

’4.5 ’9 ’9

Year 2 +6 +18 +5 4 +3 +14

’3.0 ’10

Year 3 +3 +12 +2 +5 3 +10 +4

’1 ’1.0 ’8

Year 4 +4 +12 +5 2 +5 +9

’5.0 ’10

Year 5 +7 +20 0 +5 4 +12 +3

’2.5 ’1 ’1

Mean +2 +10 +5 +3 +4 +3

Var 25 100 6.25 81 0 1 121 1 144

Sdv 5 10 2.5 9 0 1 11 1 12

Which rate of return series (among portfolios C through J) had low and which had high covariation with the rate of

return series of Portfolio A?

I cheated in not using my eyeballs to draw lines, but in using the technique of “ordinary least

Side Note:

squares” line ¬tting in Figure 13.3, instead. The lines make it even clearer that when A is high, C, G, and H tend

to be high, too; but when A is high, D, I, and J tend to be low. And neither E nor F seem to covary with A.

(You will get to compute the slope of this line”the “beta””later.)

Of course, visual relationships are in the eye of the beholder. You need something more objec-

How covariance really

works: quadrants tive that both of us can agree on. Here is how to compute a precise measure. The ¬rst step is

above/below means.

to determine the means of each X and each Y variable and mark these into the ¬gure”which

is done for you in Figure 13.2. The two means divide your data into four quadrants. Now, intu-

itively, points in the northeast or southwest quadrants (in white) suggest a positive covariation;

points that are in the northwest or southeast quadrants (in red) suggest a negative covariation.

In other words, the idea of all of the covariation measures is that two series, call them X and

Y , are positively correlated

• when X tends to be above its mean, Y also tends to be above its mean (upper right quad-

rant); and

• when X tends to be below its mean, Y also tends to be below its mean (lower left quadrant).

And X and Y are negatively correlated

• when X tends to be above its mean, Y tends to be below its mean (lower right quadrant);

and

• when X tends to be below its mean, Y tends to be above its mean (upper left quadrant).

¬le=statistics-g.tex: RP

315

Section 13·4. Bivariate Statistics: Covariation Measures.

Figure 13.2. H as Function of A

15

Northwest Quadrant: Northeast Quadrant: Product is Positive

Y Deviation is Positive, X Deviation is Negative

Product is Positive*Negative = Negative

A is below A mean; <==> X Deviation is Negative

10

A is above A mean; <==> X Deviation is Positive

5

H

0

H is above H mean; <==> Y Deviation is Positive

H is below H mean; <==> Y Deviation is Negative

pulls towards negative slope

’5

Southwest Quadrant: Product is Positive

’10

Southeast Quadrant: Product is Negative

’10 ’5 0 5 10

A

Points in the red quadrants pull the overall covariance statistics into a negative direction. Points in the white

quadrants pull the overall covariance statistics into a positive direction.

Covariance

How can you make a positive number for every point that is either above both the X and Y The main idea is to think

about where points lie,

means, or both below the X and Y means, and a negative number for every point that is above

within the quadrants in

one mean and below the other? Easy! First, you measure each data point in terms of its distance the ¬gure. Multiply the

from its mean, so you subtract the mean from each data point, as in Table 13.5. Points in the deviations from the

mean to get the right

northeast quadrant are above both means, so both demeaned values are positive. Points in the

sign for each data point

southwest quadrant are below both means, so both demeaned values are negative. Points in the relative to each

quadrant.

other two quadrants have one positive and one negative demeaned number. Now, you know

that the product of either two positive or two negative numbers is positive, and the product

of one positive and one negative number is negative. So if you multiply your deviations from

the mean, the product has a positive sign if it is in the upper-right and lower-left quadrants

(the deviations from the mean are either both positive or both negative), and a negative sign if

it is in the upper-left and lower-right quadrants (only one deviation from the mean is positive,

the other is negative). A point that has a positive product pulls towards positive covariation,

whereas a negative product pulls towards negative covariation.

You want to know the average pull. This is the covariance”the average of these products The covariance is the

average of the

of two variables™ deviations from their own means. Try it for A and C (again, following the

multiplied deviations

standard method of dividing by N ’ 1 rather than N): from their means.

(’7)·(’14) + (+4)·(+8) + (+1)·(+2) + (’3)·(’6) + (+5)·(+10)

Cov(A, C) = = 50

4

(13.8)

¯ ¯

(a1 ’ a)·(b1 ’ b) + ... + (a5 ’ a)·(b5 ’ b)

¯ ¯

Cov(A, C) = .

N ’1

¬le=statistics-g.tex: LP

316 Chapter 13. Statistics.

Figure 13.3. B Through I as functions of A, Lines Added

15

15

Perfect Covariation

D

C

5

5

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

15

15

Zero Covariation

F

E

5

5

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

Reliable Association, Shallow Slope Unreliable Association, Steep Slope

Positive Covariation

15

15

H

G

5

5

0

0

’10

’10

pulls the other way

’10 0 10 20 30 ’10 0 10 20 30

A

A

pulls the other way

Negative Covariation

15

15

5

5

J

I

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

The ¬ve observations are marked by circles. The areas north, south, east, and west of the X and Y means are now

marked. A cross with arm lengths equal to one standard deviation is also placed on each ¬gure.

Which points push the relationship to be positive, which points push the relationship to be negative?

¬le=statistics-g.tex: RP

317

Section 13·4. Bivariate Statistics: Covariation Measures.

Table 13.5. Illustrative Rates of Return Time Series on Nine Assets, De-Meaned

Rates of Returns on Portfolios (or Assets or Securities)

A C D E F G H I J

Observation

’7 ’14 ’9 ’1 ’9

Year 1 +3.5 0 +1 +9

’2.0 ’8 ’8 ’1

Year 2 +4 +8 0 +1 +11

’0.5 ’13

Year 3 +1 +2 +3 0 0 +11 0

’3 ’6 ’1 ’7

Year 4 +1.5 +13 0 +1 +6

’2.5 ’1 ’13

Year 5 +5 +10 +1 0 +1 +13

E (˜)

r 0 0 0.0 0 0 0 0 0 0

V (˜)

ar r 25 100 6.25 81 0 1 121 1 144

Sdv (˜)

r 5 10 2.5 9 0 1 11 1 12

It will be easier to work with the series from Table 13.4 if you ¬rst subtract the mean from each series.

Note how in this A vs. C relationship, each term in the sum is positive, and therefore pulls the

average (the covariance) towards being a positive number. You can see this in the ¬gure, because

each and every point lies in the two “positivist” quadrants. Repeat this for the remaining series:

(’7)·(+3.5) + (+4)·(’2) + (+1)·(’0.5) + (’3)·(+1.5) + (+5)·(’2.5)

Cov(A, D) = = ’12.5 ;

4

(’7)·(’9) + (+4)·(’8) + (+1)·(+3) + (’3)·(+13) + (+5)·(+1)

Cov(A, E) = = 0 ;

4

(’7)·(0) + (+4)·(0) + (+1)·(0) + (’3)·(0) + (+5)·(0)

Cov(A, F) = = 0 ;

4

(’7)·(’1) + (+4)·(+1) + (+1)·(0) + (’3)·(’1) + (+5)·(+1)

Cov(A, G) = = 4.75 ;

4

(13.9)

(’7)·(’9) + (+4)·(’8) + (+1)·(+11) + (’3)·(’7) + (+5)·(+13)

Cov(A, H) = = 32 ;

4

(’7)·(+1) + (+4)·(’1) + (+1)·(0) + (’3)·(+1) + (+5)·(’1)

Cov(A, I) = = ’4.75 ;

4

(’7)·(+9) + (+4)·(+11) + (+1)·(’13) + (’3)·(+6) + (+5)·(’13)

Cov(A, J) = = ’28.75 ;

4

(x1 ’ x)·(y1 ’ y) + ... + (x5 ’ x)·(y5 ’ y)

¯ ¯ ¯ ¯

Cov(X, Y ) = .

N ’1

Having computed the covariances, look at Figure 13.3. A and D, A and I, and A and J covary

negatively on average; A and C, A and G, and A and H covary positively; and A and E, and

A and F have zero covariation. Now, take a look at the A vs. H ¬gure (also in Figure 13.2)

again: there is one lonely point in the lower-right quadrant, marked with an arrow. It tries to

pull the line into a negative direction. In the A vs. H covariance computation, this point is the

(+4)·(’1) term, which is the only negative component in the overall sum. If this one point had

not been in the data, the association between A and H would have been more strongly positive

than 4.75.

The covariance tells you the sign (whether a relationship is positive or negative), but its magni- Covariance gives the

right sign, but not much

tude is di¬cult to interpret”just as you could not really interpret the magnitude of the variance.

more. It is often

Indeed, the covariance not only shares the weird squared-units problem with the variance, but abbreviated as a sigma

the covariance of a variable with itself is the variance! This can be seen from the de¬nitions: with two subscripts.

Both multiply each historical outcome deviation by itself, and then divide by the same number,

(x1 ’ x)·(x1 ’ x) + ... + (xN ’ x)·(xN ’ x)

¯ ¯ ¯ ¯

Cov(X, X) =

N ’1 (13.10)

(x1 ’ x)2 + ... + (xN ’ x)2

¯ ¯

= = Var(X) .

N ’1

¬le=statistics-g.tex: LP

318 Chapter 13. Statistics.

And, just like the variance is needed to compute the standard deviation, the covariance is

needed to compute the next two covariation measures (correlation and beta). The covariance

statistic is so important and used so often that the Greek letter sigma (σ ) with two subscripts

has become the standard abbreviation:

Covariance between X and Y : Cov(X, Y ) σX,Y

(13.11)

2

Var(X) σX,X =

Variance of X: σX .

These are sigmas with two subscripts. If you use only one subscript, you mean the standard

deviation:

(13.12)

Standard Deviation of X: Sdv(X) σX .

This is easy to remember if you think of two subscripts as the equivalent of multiplication

(squaring), and of one subscript as the equivalent of square-rooting.

Correlation

To better interpret the covariance, you need to somehow normalize it. A ¬rst normalization

Correlation is closely

related to, but easier to of the covariance gives the correlation. It divides the covariance by the standard deviations of

interpret than

both variables. Applying this formula, you can compute

Covariance.

Cov(A, C) +50

Correlation(A, C) = = = +1.00 ;

Sdv (A) · Sdv (C) 10 · 5

Cov(A, D) ’12.5

Correlation(A, D) = = = ’1.00 ;

Sdv (A) · Sdv (D) 5 · 2.5

Cov(A, E) 0

Correlation(A, E) = = = ±0.00 ;

Sdv (A) · Sdv (E) 5·9

Cov(A, F) 0

Correlation(A, F) = = = not de¬ned ;

Sdv (A) · Sdv (F) 5·0

Cov(A, G) 4.75 (13.13)

Correlation(A, G) = = = +0.95 ;

Sdv (A) · Sdv (G) 5·1

Cov(A, H) 32

Correlation(A, H) = = = +0.58 ;

Sdv (A) · Sdv (H) 5 · 11

Cov(A, I) ’4.75

Correlation(A, I) = = = ’0.95 ;

Sdv (A) · Sdv (I) 5·1

Cov(A, J) ’28.75

Correlation(A, J) = = = ’0.48 ;

Sdv (A) · Sdv (J) 5 · 12

Cov(X, Y )

Correlation(X, Y ) = .

Sdv (X) · Sdv (Y )

The correlation measures the reliability of the relationship between two variables. A higher ab-

solute correlation means more reliability, regardless of the strength of the relationship (slope).

The nice thing about the correlation is that it is always between ’100% and +100%. Two vari-

ables that have a correlation of 100% always perfectly move in the same direction, two variables

that have a correlation of “100% always perfectly move in the opposite direction, and two vari-

ables that are independent have a correlation of 0%. This makes the correlation very easy to in-

terpret. The correlation is unit-less, regardless of the units of the original variables themselves,

and is often abbreviated with the Greek letter rho (ρ). The perfect correlations between A

and C or D tell you that all points lie on straight lines. (Verify this visually in Figure 13.3!) The

correlations between A and G (95%) and the correlations between A and I (’95%) are almost as

strong: the points almost lie on a line. The correlation between A and H, and the correlation

between A and J are weaker: the points do not seem to lie on a straight line, and knowing A

does not permit you to perfectly predict H or J.

¬le=statistics-g.tex: RP

319

Section 13·4. Bivariate Statistics: Covariation Measures.

If two variable are always acting identically, they have a correlation of 1. Therefore, you can

Side Note:

determine the maximum covariance between two variables:

˜˜

Cov(X, Y )

˜˜ ˜ ˜

1= ⇐ Cov(X, Y ) = Sdv(X) · Sdv(Y ) .

’ (13.14)

˜ ˜

Sdv(X) · Sdv(Y )

It is mathematically impossible for the absolute value of the covariance to exceed the product of the two standard

deviations.

Beta

The correlation cannot tell you that A has more pronounced in¬‚uence on C than on D: although Beta is yet another

covariation measure,

both correlations are perfect, if A is higher by 1, your prediction of C is higher by 2; but if A

and has an interesting

is higher by 1, your prediction of D is lower by only ’0.5. You need a measure for the slope of graphical interpretation.

the best-¬tting line that you would draw through the points. Your second normalization of the

covariance does this: it gives you this slope, the beta. It divides the covariance by the variance

of the X variable (here, A), i.e., instead of one dividing by the standard deviation of Y (as in the

correlation), it divides a second time by a standard deviation of X:

Cov(A, C) +50

βC,A = = = +2.00 ;

Sdv (A) · Sdv (A) 5·5

Cov(A, D) ’12.5

βD,A = = = ’0.50 ;

Sdv (A) · Sdv (A) 5·5

Cov(A, E) 0

βE,A = = = ±0.00 ;

Sdv (A) · Sdv (A) 5·5

Cov(A, F) 0

βF,A = = = ±0.00 ;

Sdv (A) · Sdv (A) 5·5

Cov(A, G) 4.75 (13.15)

βG,A = = = +0.19 ;

Sdv (A) · Sdv (A) 5·5

Cov(A, H) 32

βH,A = = = +1.28 ;

Sdv (A) · Sdv (A) 5·5

Cov(A, I) ’4.75

= = = ’0.19 ;

βI,A

Sdv (A) · Sdv (A) 5·5

Cov(A, J) ’28.75

= = = ’1.15 ;

βJ,A

Sdv (A) · Sdv (A) 5·5

Cov(X, Y ) Cov(X, Y )

= =

βY ,X .

Sdv (X) · Sdv (X) V (X)

ar

The ¬rst subscript on beta denotes the variable on the Y axis, while the second subscript on

beta denotes the variable on the X axis. It is the latter that provides the variance (the divisor).

Beta got its name from the fact that the most common way to write down the formula for a line

is y = ± + β · x, and the best-¬tting line slope is exactly what beta is. Unlike correlations, betas

are not limited to any range. Beta values of +1 and ’1 denote the diagonal lines, beta values of

0 and ∞ denote the horizontal and vertical line. Inspection of Figure 13.3 shows that the slope

of A vs. C is 2 to 1, while the slope of A vs. D is shallower 1 to -2. This is exactly what beta

tells us: βC,A is 2.0, while βD,A is ’0.5. Unlike the correlation, beta cannot tell you whether

your line ¬ts perfectly or imperfectly. But, unlike the correlation, beta can tell you how much

you should change your prediction of Y if the X values change. And unlike correlation and

covariance, the order of the two variables matters in computing beta. For example, βG,A = 0.19

is not βA,G :

Cov(A, G) 5.00

= = =5. (13.16)

βA, G

Sdv(G) · Sdv(G) 1·1

¬le=statistics-g.tex: LP

320 Chapter 13. Statistics.

Digging Deeper: In a statistical package, beta can be obtained either by computing covariances and variances

and then dividing the two; or by running a Linear Regression, in which the dependent variable is the Y variable

and the independent variable is the X variable.

(13.17)

Y = ±+β·X +« .

Both methods yield the same answer.

And, as with all other statistical measures, please keep in mind that you are usually computing

When it comes to stock

returns, you really want a historical beta (line slope), although you usually are really interested in the future beta (line

to know the future slope,

slope)!

although you usually

only have the historical

slope.

Summary of Covariation Measures

Table 13.6 summarizes the three covariation measures.

Table 13.6. Comparison of Covariation Measures

Order

Units Magnitude of Variables computed as Measures

σX,Y

Covariance squared practically meaningless irrelevant No Intuition

between ’1 and +1 σX,Y / (σX · σY )

Correlation unit-less irrelevant Reliability

σX,Y / σX,X

beta (Y,X) unit-less meaningful (slope) important Slope

σX,Y / σY ,Y

beta (X,Y) unit-less meaningful (slope) important Slope

2

All covariation measures share the same sign. If one is positive (negative), so are all others. Recall that σX,X = σX ,

which must be positive.

Figure 13.4 summarizes everything that you have learned about the covariation of your series.

It plots the data points, the quadrants, the best ¬tting lines, and a text description of the three

measures of covariation.

13·4.C. Computing Covariation Statistics For The Annual Returns Data

Now back to work! It is time to compute the covariance, correlation, and beta for your three

Applying the covariance

formula to the historical investment choices, S&P500, IBM, and Sony. Return to the deviations from the means in Ta-

data.

ble 13.2. As you know, to compute the covariances, you add the products of the demeaned

observations and divide by (T ’ 1)”tedious, but not di¬cult work:

(+0.162) · (’0.366) + ... + (’0.335) · (’0.511)

Cov(˜S&P500 , rIBM ) = = 0.0330 ;

r ˜

11

(+0.162) · (’0.345) + ... + (’0.335) · (’0.323)

Cov(˜S&P500 , rSony ) = = 0.0477 ;

r ˜

11

(’0.366) · (’0.345) + ... + (’0.511) · (’0.323)

Cov(˜IBM , rSony ) = = 0.0218 ;

r ˜

11

(˜i,s=1 ’ E (˜i )) · (˜j,s=1 ’ E (˜j )) + ... + (˜i,s=T ’ E (˜i ) · (˜j,s=T ’ E (˜j ))

r r r r r r r r

Cov(˜i , rj ) =

r˜ .

T ’1

(13.18)

All three covariance measures are positive. You know from the discussion on Page 317 that,

aside from their signs, the covariances are almost impossible to interpret. Therefore, now

¬le=statistics-g.tex: RP

321

Section 13·4. Bivariate Statistics: Covariation Measures.

Figure 13.4. C Through J as functions of A with lines and text

covarA, D = ’12.5

15

15

Perfect Correlation

correlA, D = ’1

betaA, D = ’0.5

covarA, C = 50

D

C

5

5

correlA, C = 1

0

0

betaA, C = 2

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

covarA, F = 0

covarA, E = 0

15

15

correlA, F = NA

correlA, E = 0

Zero Covariation

betaA, F = 0

betaA, E = 0 F

E

5

5

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

Reliable Association, Shallow Slope Unreliable Association, Steep Slope

covarA, G = 4.75

15

15

Positive Covariation

correlA, G = 0.95

betaA, G = 0.19

covarA, H = 32

H

G

5

5

correlA, H = 0.58

0

0

betaA, H = 1.28

’10

’10

pulls the other way

’10 0 10 20 30 ’10 0 10 20 30

A

A

pulls the other way

Negative Covariation

covarA, I = ’4.75 covarA, J = ’28.75

15

15

correlA, I = ’0.95 correlA, J = ’0.48

betaA, I = ’0.19 betaA, J = ’1.15

5

5

J

I

0

0

’10

’10

’10 0 10 20 30 ’10 0 10 20 30

A A

The ¬ve observations are marked by circles. The areas north, south, east, and west of the X and Y means are now

marked. A cross with arm lengths equal to one standard deviation is also placed on each ¬gure.

¬le=statistics-g.tex: LP

322 Chapter 13. Statistics.

compute the correlations, your measure of how well the best-¬tting line ¬ts the data. The

correlations are the covariances divided by the two standard deviations:

3.30%

Correlation(˜S&P500 , rIBM ) = = 44.7% ;

r ˜

19.0% · 38.8%

4.77%

Correlation(˜S&P500 , rSony ) = = 27.8% ;

r ˜

19.0% · 90.3%

(13.19)

2.18%

Correlation(˜IBM , rSony ) = = 6.2%

r ˜ ;

38.8% · 90.3%

Cov(˜i , rj )

r˜

Correlation(˜i , rj ) =

r˜ .

Sdv(˜i ) · Sdv(˜j )

r r

So, the S&P500 has correlated much more with IBM, than the S&P500 has correlated with Sony

(or IBM with Sony). This makes intuitive sense. Both S&P500 and IBM are U.S. investments,

while Sony is a stock that is trading in an entirely di¬erent market.

Finally, you might want to compute the beta of rSony with respect to the rS&P500 (i.e., divide the

˜ ˜

Applying the beta

formula to the historical covariance of rSony with rS&P500 by the variance of rS&P500 ), and the beta of rIBM with respect to

˜ ˜ ˜ ˜

data.

the rS&P500 . Although you should really write βrIBM ,˜S&P500 , no harm is done if you omit the r for

˜ ˜

r

˜

convenience, and just elevate the subscripts when there is no risk of confusion. Thus, you can

just write βIBM,S&P500 , instead.

3.30%

βIBM,S&P500 = = 0.91 ;

(19.0%)2

4.77% (13.20)

βSony,S&P500 = = 1.31 .

(19.0%)2

Cov(˜i , rj )

r˜

=

βi,j .

Sdv(˜i ) · Sdv(˜j )

r r

Beta is the slope of the best-¬tting line when the rate of return on S&P500 is on the X-axis and

the rate of return on IBM (or Sony) is on the Y -axis. Note that although Sony was correlated

less with the S&P500 than IBM was correlated with the S&P500, it is Sony that has the steeper

slope. Correlation and beta do measure di¬erent things. The next chapters will elaborate more

on the importance of beta in ¬nance.

You have now computed all the statistics that this book will use: means, variances, standard de-

The Marquis De Sade

would not have been viations, covariances, correlations, and betas. Only modestly painful, I hope. The next chapter

happy. Ok, neither

will use no new statistics, but it will show how they work in the context of portfolios.

would Mary Poppins.

Solve Now!

Q 13.3 What is the correlation of a random variable with itself?

Q 13.4 What is the slope (beta) of a random variable with itself?

Q 13.5 Return to the historical rates of return on the DAX from Question 13.2 (Page 311). Com-

pute the covariances, correlations and betas for the DAX with respect to each of the other three

investment securities.

Q 13.6 Very advanced Question: Compute the annual rates of return on a portfolio of 1/3 IBM

and 2/3 Sony. Then compute the beta of this portfolio with respect to the S&P500. How does this

compare to the beta of IBM with respect to the S&P500, and the beta of Sony with respect to the

S&P500?

¬le=statistics-g.tex: RP

323

Section 13·5. Summary.

13·5. Summary

The chapter covered the following major points:

• Finance often uses statistics based on historical rates of return as standings to predict

statistics for future rates of return. This is a leap of faith”often, but not always correct.

• Tildes denote random variables”a distribution of possible future outcomes. In practice,

the distribution is often given by historical data.

• The expected rate of return is a measure of the reward. It is often forecast from the

historical mean.

• The standard deviation”and its intermediate input, the variance”are measures of the

risk. The standard deviation is (practically) the square-root of the average squared devia-

tion from the mean.

• Covariation measures how two variables move together. Causality induces covariation,

but not vice-versa. So, two variables can covary, even if neither variable would be the

cause of the other.

• Like variance, the covariance is di¬cult to interpret. Thus, covariance is often only an

intermediate number on the way to more intuitive statistics.

• The correlation is the covariance divided by the standard deviation of both variables. It

measures how reliable a relationship between two variables is. The order of variables

does not matter. The correlation is always between ’1 and +1.

• The beta is the covariance divided the standard deviation of the variable on the X axis

squared (which is the variance). It measures how steep a relationship between two vari-

ables is. The order of variables matters: βA,B is di¬erent from βB,A .

¬le=statistics-g.tex: LP

324 Chapter 13. Statistics.

Advanced Appendix: More Statistical Theory

13·6.

13·6.A. Historical and Future Statistics

The theory usually assumes that although you do not know the outcome of a random variable,

Physical processes often

have known properties. you do know the statistics (such as mean and standard deviation) for the outcomes. That is,

Stock returns do not.

you can estimate or judge a random variable™s unknown mean, standard deviation, covariance

(beta), etc. Alas, while this is easy for the throw of a coin or a die, where you know the physical

properties of what determines the random outcome, this is not so easy for stock returns. For

example, what is the standard deviation of next month™s rate of return on PepsiCo?

You just do not have a better alternative than to assume that PepsiCo™s returns are manifesta-

Use historical statistics

as estimators of future tions of the same statistical process over time. So, if you want to know the standard deviation of

statistics.

PepsiCo™s next month™s rate of return, you typically must assume that each historical monthly

rate of return”at least over the last couple of years”was a draw from the same distribution.

Therefore, you can use the historical rates of return, assuming each one was an equally likely

outcome, to estimate the future standard deviation. Analogously, the mechanics of the compu-

tation for obtaining the estimated future standard deviation are exactly the same as those you

used to obtain an actual historical standard deviation.

But, using historical statistics and then arguing that they are representative of the future is

This works well for

standard deviations and a bit of a leap. Empirical observation has taught us that doing so works well for standard

covariation statistics,

deviations and covariation measures: that is, the historical statistics obtained from monthly

but not for means.

rates of returns over the last 3 to 5 years appear to be fairly decent predictors of future betas

and standard deviations. Unfortunately, the historical mean rates of return are fairly unreliable

predictors of the future rates of returns.

Q 13.7 When predicting the outcome of a die, why do you not use historical statistics on die

throws as predictors of future die throws?

Q 13.8 Are historical ¬nancial securities™ mean rates of return good predictors of future mean

rates of return?

Q 13.9 Are historical ¬nancial securities™ standard deviations and correlations of rates of return

good predictors of their future equivalents?

13·6.B. Improving Future Estimates From Historical Estimates

The principal remaining problem in the reliability of historical estimates of covariances for

Extreme outcomes have

more of two prediction is what statisticians call “regression to the mean.” That is, the most extreme his-

components: higher

torical estimates are likely caused not only by the underlying true estimates, but even more

expected outcome and a

so by chance. For example, if all securities had a true standard deviation of 30% per annum,

higher error term

(which will not repeat).

over a particular year some might show a standard deviation of 40%, while others might show

a standard deviation of 20%. Those with the high 40% historical standard deviations are most

likely to have lower than their historical standard deviations (dropping back to 30%). Those

with the low 20% historical standard deviations are most likely to have higher than their histor-

ical standard deviations (increasing back to 30%). This can also manifest itself in market beta

estimates. Predicting future betas by running a regression with historical rate of return data

is too naïve. The reason is that a stock that happened to have a really high return on one day

will show too high a beta if the overall stock market happened to have gone up this day and

too low a beta if the overall stock market happened to have gone down this day. Such extreme

observations tend not to repeat under the same market conditions in the future.

¬le=statistics-g.tex: RP

325

Section 13·6. Advanced Appendix: More Statistical Theory.

Statisticians handle such problems with a technique called “shrinkage.” The historical estimates Shrinkage just reduces

the estimate, hoping to

are reduced (“shrunk”) towards a more common mean. Naturally, the exact amount by which

adjust for extremes™

historical estimates should be shrunk and what number they should be shrunk towards is a errors.

very complex technical problem”and doing it well can make millions of dollars. This book

is de¬nitely not able to cover this subject appropriately. Still, reading this book, you might

wonder if there is something both quick-and-dirty and reasonable that you can do to obtain

better estimates of future mean returns, better estimates of future standard deviations, and

better estimates of future betas.

The answer is yes. Here is a two minute non-formal heuristic estimation job: To predict a Advice: take the average

of the market historical

portfolio statistic, average the historical statistic on your particular portfolio with the historical

statistic and your

statistic on the overall stock market. There are better and more sophisticated methods, but individual stock

this averaging is likely to predict better than the historical statistic of the particular portfolio historical statistic. It

probably predicts better.

by itself. (With more time and statistical expertise, you could use other information, such

as beta, the industry historical rate of return, or the average P/E ratio of the portfolio, to

produce even better guestimates of future portfolio behavior.) For example, the market beta

for the overall market is “1.0,” so my prescription is to average the estimated beta and 1.0.

Commercial vendors of market beta estimates do something similar, too. Bloomberg computes

the weighted average of the estimated market beta and the number one, with weights of 0.67 and

0.33, respectively, Value Line reverses these two weights. Ibbotson Associates, however, does

something more sophisticated, shrinking beta not towards one, but towards a “peer group”

market beta.

Let us apply some shrinking to the statistics in Table 14.7 on Page 353. If you were asked to An example of shrinking.

guestimate an expected annual rate of return for Wal-Mart over the next year, you would not

quote Wal-Marts historical 31.5% as your estimate of Wal-Mart™s future rate of return. Instead,

you could quote an average of 31.5% and 6.3% (the historical rate of return on the market from

1997 to 2002), or about 20% per annum. (This assumes that you are not permitted to use more

sophisticated models, such as the CAPM.) You would also guestimate Wal-Mart™s risk to be the

average of 31.1% and 18.7%, or about 25% per year. Finally, you would guestimate Wal-Mart™s

market beta to be about 0.95. The speci¬c market index to which you shrink matters little (the

Dow-Jones 30 or the S&P500)”but it does matter that you do shrink somehow! An even better

target to shrink towards would be the industry average statistics. (Some researchers go as far

as to estimate only industry betas, and forego computing the individual ¬rm beta altogether!

This is shrinking to a very large degree.) However, good shrinking targets are beyond the scope

of this book. Would you like to bet that the historical statistics are better guestimates than the

shrunk statistics? (If so, feel free to invest your money into Wal-Mart, and deceive yourself that

you will likely earn a mean return 31.5%! Good luck!)

Here is a summary of some recommendations. Based on regressions using ¬ve years of his- What works reasonably

well, what does not.

torical monthly data, to predict one-year-ahead statistics, you can use reasonable shrinkage

methods for large stocks (e.g., members of the S&P500) as follows:

Mean Nothing works too well (i.e.,predicting the future from the past).

Market-Model Alpha Nothing works too well.

Market-Model Beta Average the historical beta with the number “1.” For example, if the regression

coe¬cient (covariance/variance) is 4, use a beta of 2.5.

Standard Deviation Average the historical standard deviation of the stock and the historical standard

deviation of the S&P500. Then increase by 30%, because, historically, for unknown

reasons, volatility has been increasing.

Recall that the market model is the linear regression in which the x variable is the rate of return

on the S&P500, and the y variable is the rate of return on the stock in which you are interested.

¬le=statistics-g.tex: LP

326 Chapter 13. Statistics.

13·6.C. Other Measures of Spread

There are measures of risk other than the variance and standard deviation, but they are obscure

enough to deserve your ignorance (at least until an advanced investments course). One such

measure is the mean absolute deviation (MAD) from the mean. For the example of a rate of

return of either +25% or ’25%,

(13.21)

MAD = 1/2 · |(’25%)| + 1/2 · |(+25%)| = 1/2 · 25% + 1/2 · 25% = 25% .

In this case, the outcome happens to be the same as the standard deviation, but this is not

generally the case. The MAD gives less weight than the standard deviation to observations far

from the mean. For example, if you had three returns, ’50%, ’50% and +100%, the mean would

be 0%, the standard deviation 70.7%, and the MAD 66.7%.

Another measure of risk is the semivariance (SV ), which relies only on observations below

zero or below the mean. That is, all positive returns (or deviations from the mean) are simply

ignored. For the example of +25% or ’25%,

SV = 1/2 · (’25%)2 + 1/2 · (0) = 1/2 · 0.0625 = 0.03125 . (13.22)

The idea is that investors fear only realizations that are negative (or below the mean).

Finally, note that the correlation has another nice interpretation: the correlation squared is the

R 2 in a bivariate OLS regression with a constant.

13·6.D. Translating Mean and Variance Statistics Into Probabilities

Although you now know enough to compute a measure of risk, you have not bothered to explore

Translating standard

deviations into more how likely outcomes are. For example, if a portfolio™s expected rate of return is 12.6% per year,

intuitive risk

and its standard deviation is 22% per year, what is the probability that you will lose money

assessments

(earn below 0%)? What is the probability that you will earn 15% or more? 20% or more?

It turns out that if the underlying distribution looks like a bell curve”and many common

Stock returns often

assume a normal portfolio return distributions have this shape”there is an easy procedure to translate mean

(bell-shaped)

and standard deviation into the probability that the return will end being less than x. In fact,

distribution.

this probability correspondence is the only advantage that bell shaped distributions provide!

Everything else works regardless of the actual shape of the distribution.

For concreteness sake, assume you want to determine the probability that the rate of return on

An example: Z-score and

probability. this portfolio is less than +5%:

Step 1 Subtract the mean from 5%. In the example, with the expected rate of return of 12.6%,

the result is 5% ’ 12.6% = ’7.6%.

Step 2 Divide this number by the standard deviation. In ths example, this is ’7.6% divided by

22%, which comes to ’0.35. This number is called the Score or Z-score.

Step 3 Look up the probability for this Score in the Cumulative Normal Distribution Table in

Table B.1 (Page 796). For the score of ’0.35, this probability is about 0.36.

In sum, you have determined that if returns are drawn from a distribution with a mean of 12.6%

and a standard deviation of 22%, then the probability of observing a single rate of return of

+5% or less is about 36%. It also follows that the probability that a return is greater than +5%

must be 100% ’ 36% = 64%.

Side Note: In the real world, this works well enough”but not perfectly. So, do not get fooled by theoretical

pseudo-accuracy. Anything between 30% and 40% is a reasonable prediction here.

¬le=statistics-g.tex: RP

327

Section 13·6. Advanced Appendix: More Statistical Theory.

Now recall portfolio P in Table 14.1. P had a mean of 12.6% and a standard deviation of 22%. How well does the

Z-score ¬t in the

You have just computed that about one third of the 12 annual portfolio returns should be below

history?

+5%. 1991, 1993, 1994, 1995, 1996, 1997, 1998, and 1999 performed better than +5%; 1992,

2000, 2001, and 2002 performed worse. So, as predicted by applying the normal distribution

table, about one third of the annual returns were 5% or less.

A common question is “what is the probability that the return will be negative?” Use the same How likely is it that you

will lose money?

technique,

Step 1 Subtracting the mean of 0% yields 0.0% ’ 12.6% = ’12.6%.

Step 2 Dividing ’12.6% by the standard deviation of 22% yields the score of ’0.57.

Step 3 For this score of ’0.57, the probability is about 28%.

In words, the probability that the rate of return will be negative is around 25% to 30%. And,

therefore, the probability that the return will be positive is around 70% to 75%. The table shows

that 4 out of the 12 annual rates of return are negative. This is most likely sampling error:

with only 12 annual rates of return, it was impossible for the distribution of data to accurately

follow a bell shape.

Many portfolio returns have what is called “fat tails.” This means that the probability of

Digging Deeper:

extreme outcomes”especially extreme negative outcomes”is often higher than suggested by the normal distri-

bution table. For example, if the mean return were 30% (e.g. for a multi-year return) and the standard deviation

were 10%, the score for a value of 0 is ’3. The table therefore suggests that the probability of drawing a negative

return should be 0.135%, or about once in a thousand periods. Long experience with ¬nancial data suggests that

this is often much too overcon¬dent for the real world. In some contexts, the true probability of even the most

negative possible outcome (’100%) may be as high as 1%, even if the Z-score suggests 0.0001%!

13·6.E. Correlation and Causation

A warning: covariation is related to, but not the same as Causation. If one variable “causes” Correlation does not

imply Causation.

another, then the two variables will be correlated. But the opposite does not hold. For example,

snow and depression are positively correlated, but neither causes the other. Instead, there is

another variable (winter) that has an in¬‚uence on both snow and depression.

Solve Now!

Q 13.10 If the mean is 20 and the standard deviation is 15, what is the probability that the value

will turn out to be less than 0?