= n’1 (a X H )(HX a) since H H = H,

p

= n’1 y y = n’1 2

yj ≥ 0

j=1

for y = HX a. It is well known from the one-dimensional case that n’1 n (xi ’ x)2

i=1

’1

as an estimate of the variance exhibits a bias of the order n (Breiman, 1973). In the

n

multidimensional case, Su = n’1 S is an unbiased estimate of the true covariance. (This will

be shown in Example 4.15.)

The sample correlation coe¬cient between the i-th and j-th variables is rXi Xj , see (3.8). If

D = diag(sXi Xi ), then the correlation matrix is

R = D’1/2 SD’1/2 , (3.21)

where D’1/2 is a diagonal matrix with elements (sXi Xi )’1/2 on its main diagonal.

EXAMPLE 3.8 The empirical covariances are calculated for the pullover data set.

The vector of the means of the four variables in the dataset is x = (172.7, 104.6, 104.0, 93.8) .

«

1037.2 ’80.2 1430.7 271.4

¬ ’80.2 219.8 92.1 ’91.6 ·

The sample covariance matrix is S = ·.

¬

1430.7 92.1 2624 210.3

271.4 ’91.6 210.3 177.4

The unbiased estimate of the variance (n =10) is equal to

«

1152.5 ’88.9 1589.7 301.6

10 ¬ ’88.9 244.3 102.3 ’101.8 ·

Su = S = ·.

¬

1589.7 102.3 2915.6 233.7

9

301.6 ’101.8 233.7 197.1

94 3 Moving to Higher Dimensions

«

’0.17

1 0.87 0.63

¬ ’0.17 0.12 ’0.46 ·

1

The sample correlation matrix is R = ¬ ·.

0.87 0.12 1 0.31

0.63 ’0.46 0.31 1

Linear Transformation

In many practical applications we need to study linear transformations of the original data.

This motivates the question of how to calculate summary statistics after such linear trans-

formations.

Let A be a (q — p) matrix and consider the transformed data matrix

Y = X A = (y1 , . . . , yn ) . (3.22)

The row yi = (yi1 , . . . , yiq ) ∈ Rq can be viewed as the i-th observation of a q-dimensional

random variable Y = AX. In fact we have yi = xi A . We immediately obtain the mean

and the empirical covariance of the variables (columns) forming the data matrix Y:

1 1

Y 1n = AX 1n = Ax

y= (3.23)

n n

1 1

SY = Y HY = AX HX A = ASX A . (3.24)

n n

Note that if the linear transformation is nonhomogeneous, i.e.,

yi = Axi + b where b(q — 1),

only (3.23) changes: y = Ax + b. The formula (3.23) and (3.24) are useful in the particular

case of q = 1, i.e., y = X a ” yi = a xi ; i = 1, . . . , n:

y = ax

Sy = a SX a.

EXAMPLE 3.9 Suppose that X is the pullover data set. The manager wants to compute

his mean expenses for advertisement (X3 ) and sales assistant (X4 ).

Suppose that the sales assistant charges an hourly wage of 10 EUR. Then the shop manager

calculates the expenses Y as Y = X3 + 10X4 . Formula (3.22) says that this is equivalent to

de¬ning the matrix A(4 — 1) as:

A = (0, 0, 1, 10).

Using formulas (3.23) and (3.24), it is now computationally very easy to obtain the sample

mean y and the sample variance Sy of the overall expenses:

«

172.7

¬ 104.6 ·

y = Ax = (0, 0, 1, 10) ¬

104.0 = 1042.0

·

93.8

3.4 Linear Model for Two Variables 95

« «

1152.5 ’88.9 1589.7 301.6 0

¬ ’88.9 244.3 102.3 ’101.8 · ¬ 0 ·

SY = ASX A = (0, 0, 1, 10)

¬ ·¬ ·

1589.7 102.3 2915.6 233.7 1

301.6 ’101.8 233.7 197.1 10

= 2915.6 + 4674 + 19710 = 27299.6.

Mahalanobis Transformation

A special case of this linear transformation is

zi = S ’1/2 (xi ’ x), i = 1, . . . , n. (3.25)

Note that for the transformed data matrix Z = (z1 , . . . , zn ) ,

SZ = n’1 Z HZ = Ip . (3.26)

So the Mahalanobis transformation eliminates the correlation between the variables and

standardizes the variance of each variable. If we apply (3.24) using A = S ’1/2 , we obtain

the identity covariance matrix as indicated in (3.26).

Summary

’ The center of gravity of a data matrix is given by its mean vector x =

n’1 X 1n .

’ The dispersion of the observations in a data matrix is given by the empir-

ical covariance matrix S = n’1 X HX .

’ The empirical correlation matrix is given by R = D’1/2 SD’1/2 .

’ A linear transformation Y = X A of a data matrix X has mean Ax and

empirical covariance ASX A .

’ The Mahalanobis transformation is a linear transformation zi = S ’1/2 (xi ’

x) which gives a standardized, uncorrelated data matrix Z.

3.4 Linear Model for Two Variables

We have looked many times now at downward- and upward-sloping scatterplots. What does

the eye de¬ne here as slope? Suppose that we can construct a line corresponding to the

96 3 Moving to Higher Dimensions

general direction of the cloud. The sign of the slope of this line would correspond to the

upward and downward directions. Call the variable on the vertical axis Y and the one on

the horizontal axis X. A slope line is a linear relationship between X and Y :

yi = ± + βxi + µi , i = 1, . . . , n. (3.27)

Here, ± is the intercept and β is the slope of the line. The errors (or deviations from the

line) are denoted as µi and are assumed to have zero mean and ¬nite variance σ 2 . The task

of ¬nding (±, β) in (3.27) is referred to as a linear adjustment.

In Section 3.6 we shall derive estimators for ± and β more formally, as well as accurately

describe what a “good” estimator is. For now, one may try to ¬nd a “good” estimator (±, β)

via graphical techniques. A very common numerical and statistical technique is to use those

± and β that minimize:

n

(yi ’ ± ’ βxi )2 .

(±, β) = arg min (3.28)

(±,β)

i=1

The solutions to this task are the estimators:

sXY

β= (3.29)

sXX

± = y ’ βx. (3.30)

The variance of β is:

σ2

V ar(β) = . (3.31)

n · sXX

The standard error (SE) of the estimator is the square root of (3.31),

σ

SE(β) = {V ar(β)}1/2 = . (3.32)

(n · sXX )1/2

We can use this formula to test the hypothesis that β=0. In an application the variance

σ 2 has to be estimated by an estimator σ 2 that will be given below. Under a normality

assumption of the errors, the t-test for the hypothesis β = 0 works as follows.

One computes the statistic

β

t= (3.33)

SE(β)

and rejects the hypothesis at a 5% signi¬cance level if | t |≥ t0.975;n’2 , where the 97.5%

quantile of the Student™s tn’2 distribution is clearly the 95% critical value for the two-sided

test. For n ≥ 30, this can be replaced by 1.96, the 97.5% quantile of the normal distribution.

An estimator σ 2 of σ 2 will be given in the following.

3.4 Linear Model for Two Variables 97

pullovers data

200

sales (X2)

150

100

80 90 100 110 120

price (X2)

Figure 3.5. Regression of sales (X1 ) on price (X2 ) of pullovers.

MVAregpull.xpl

EXAMPLE 3.10 Let us apply the linear regression model (3.27) to the “classic blue” pullovers.

The sales manager believes that there is a strong dependence on the number of sales as a

function of price. He computes the regression line as shown in Figure 3.5.

How good is this ¬t? This can be judged via goodness-of-¬t measures. De¬ne

yi = ± + βxi , (3.34)

as the predicted value of y as a function of x. With y the textile shop manager in the above

example can predict sales as a function of prices x. The variation in the response variable

is:

n

(yi ’ y)2 .

nsY Y = (3.35)

i=1

98 3 Moving to Higher Dimensions

The variation explained by the linear regression (3.27) with the predicted values (3.34) is:

n

(yi ’ y)2 . (3.36)

i=1

The residual sum of squares, the minimum in (3.28), is given by:

n

(yi ’ yi )2 .

RSS = (3.37)

i=1

An unbiased estimator σ 2 of σ 2 is given by RSS/(n ’ 2).

The following relation holds between (3.35)“(3.37):

n n n

2 2

(yi ’ yi )2 ,

(yi ’ y) (yi ’ y) +

= (3.38)

i=1 i=1 i=1

total variation = explained variation + unexplained variation.

The coe¬cient of determination is r2 :

n

(yi ’ y)2

explained variation

i=1

r2 = ·

= (3.39)

n

total variation

y)2

(yi ’

i=1

The coe¬cient of determination increases with the proportion of explained variation by the

linear relation (3.27). In the extreme cases where r2 = 1, all of the variation is explained by

the linear regression (3.27). The other extreme, r2 = 0, is where the empirical covariance is

sXY = 0. The coe¬cient of determination can be rewritten as

n

(yi ’ yi )2

i=1

r2 = 1 ’ . (3.40)

n

(yi ’ y)2

i=1

From (3.39), it can be seen that in the linear regression (3.27), r2 = rXY is the square of

2

the correlation between X and Y .

EXAMPLE 3.11 For the above pullover example, we estimate

β = ’0.364.

± = 210.774 and

The coe¬cient of determination is

r2 = 0.028.

The textile shop manager concludes that sales are not in¬‚uenced very much by the price (in

a linear way).