<< стр. 3(всего 4)СОДЕРЖАНИЕ >>

Summary (continued)
в†’ The eigenvectors belonging to the largest eigenvalues indicate the вЂњmain
directionвЂќ of the data.
в†’ The Jordan decomposition allows one to easily compute the power of a
symmetric matrix A: AО± = О“О›О± О“ .
в†’ The singular value decomposition (SVD) is a generalization of the Jordan

A quadratic form Q(x) is built from a symmetric matrix A(p Г— p) and a vector x в€€ Rp :
p p
Q(x) = x A x = aij xi xj . (2.21)
i=1 j=1

Deп¬Ѓniteness of Quadratic Forms and Matrices
Q(x) > 0 for all x = 0 positive deп¬Ѓnite
Q(x) в‰Ґ 0 for all x = 0 positive semideп¬Ѓnite
A matrix A is called positive deп¬Ѓnite (semideп¬Ѓnite) if the corresponding quadratic form Q(.)

is positive deп¬Ѓnite (semideп¬Ѓnite). We write A > 0 (в‰Ґ 0).
Quadratic forms can always be diagonalized, as the following result shows.

THEOREM 2.3 If A is symmetric and Q(x) = x Ax is the corresponding quadratic form,
then there exists a transformation x в†’ О“ x = y such that
p
2
x Ax= О»i y i ,
i=1

where О»i are the eigenvalues of A.

Proof:
A = О“ О› О“ . By Theorem 2.1 and y = О“ О± we have that x Ax = x О“О›О“ x = y О›y =
p 2
i=1 О»i yi . 2

Positive deп¬Ѓniteness of quadratic forms can be deduced from positive eigenvalues.
66 2 A Short Excursion into Matrix Algebra

THEOREM 2.4 A > 0 if and only if all О»i > 0, i = 1, . . . , p.

Proof:
2 2
0 < О»1 y1 + В· В· В· + О»p yp = x Ax for all x = 0 by Theorem 2.3. 2

COROLLARY 2.1 If A > 0, then Aв€’1 exists and |A| > 0.

EXAMPLE 2.6 The quadratic form Q(x) = x2 +x2 corresponds to the matrix A = 1 0 with
1 2 01
eigenvalues О»1 = О»2 = 1 and is thus positive deп¬Ѓnite. The quadratic form Q(x) = (x1 в€’ x2 )2
corresponds to the matrix A = в€’1 в€’1 with eigenvalues О»1 = 2, О»2 = 0 and is positive
1
1
semideп¬Ѓnite. The quadratic form Q(x) = x2 в€’ x2 with eigenvalues О»1 = 1, О»2 = в€’1 is
1 2
indeп¬Ѓnite.

In the statistical analysis of multivariate data, we are interested in maximizing quadratic
forms given some constraints.

THEOREM 2.5 If A and B are symmetric and B > 0, then the maximum of x Ax under
the constraints x Bx = 1 is given by the largest eigenvalue of B в€’1 A. More generally,

x Ax = О»1 в‰Ґ О»2 в‰Ґ В· В· В· в‰Ґ О»p = x Ax,
max min
{x:x Bx=1} {x:x Bx=1}

where О»1 , . . . , О»p denote the eigenvalues of B в€’1 A. The vector which maximizes (minimizes)
x Ax under the constraint x Bx = 1 is the eigenvector of B в€’1 A which corresponds to the
largest (smallest) eigenvalue of B в€’1 A.

Proof:
1/2
By deп¬Ѓnition, B 1/2 = О“B О›B О“B . Set y = B 1/2 x, then

max y B в€’1/2 AB в€’1/2 y.
x Ax =
max (2.22)
{x:x Bx=1} {y:y y=1}

From Theorem 2.1, let
B в€’1/2 A B в€’1/2 = О“ О› О“
be the spectral decomposition of B в€’1/2 A B в€’1/2 . Set

z = О“ y в‡’ z z = y О“ О“ y = y y.

Thus (2.22) is equivalent to
p
О»i zi2 .
max z О› z = max
{z:z z=1} {z:z z=1}
i=1

But
О»i zi2 в‰¤ О»1 max zi2 = О»1 .
max
z z
=1

The maximum is thus obtained by z = (1, 0, . . . , 0) , i.e.,

y = Оі1 в‡’ x = B в€’1/2 Оі1 .

Since B в€’1 A and B в€’1/2 A B в€’1/2 have the same eigenvalues, the proof is complete. 2

EXAMPLE 2.7 Consider the following matrices

12 10
A= B=
and .
23 01

We calculate
12
B в€’1 A = .
23
в€љ
в€’1
The biggest eigenvalue of the matrix B A в€љ 2 +
is 5. This means that the maximum of
x Ax under the constraint x Bx = 1 is 2 + 5.
Notice that the constraint x Bx = 1 corresponds, with our choice of B, to the points which
lie on the unit circle x2 + x2 = 1.
1 2

Summary
в†’ A quadratic form can be described by a symmetric matrix A.
в†’ Quadratic forms can always be diagonalized.
в†’ Positive deп¬Ѓniteness of a quadratic form is equivalent to positiveness of
the eigenvalues of the matrix A.
в†’ The maximum and minimum of a quadratic form given some constraints
can be expressed in terms of eigenvalues.
68 2 A Short Excursion into Matrix Algebra

2.4 Derivatives
For later sections of this book, it will be useful to introduce matrix notation for derivatives
of a scalar function of a vector x with respect to x. Consider f : Rp в†’ R and a (p Г— 1) vector
x, then в€‚f (x) is the column vector of partial derivatives в€‚f (x) , j = 1, . . . , p and в€‚f (x) is the
в€‚x в€‚xj в€‚x

row vector of the same derivative ( в€‚f (x) is called the gradient of f ).
в€‚x
2 в€‚ f (x)
We can also introduce second order derivatives: в€‚xв€‚x is the (p Г— p) matrix of elements
в€‚ 2 f (x) в€‚ 2 f (x)
, i = 1, . . . , p and j = 1, . . . , p. ( в€‚xв€‚x is called the Hessian of f ).
в€‚xi в€‚xj

Suppose that a is a (p Г— 1) vector and that A = A is a (p Г— p) matrix. Then
в€‚a x в€‚x a
= = a, (2.23)
в€‚x в€‚x

в€‚x Ax
= 2Ax. (2.24)
в€‚x

The Hessian of the quadratic form Q(x) = x Ax is:
в€‚ 2 x Ax
= 2A. (2.25)
в€‚xв€‚x
EXAMPLE 2.8 Consider the matrix
12
A= .
23
From formulas (2.24) and (2.25) it immediately follows that the gradient of Q(x) = x Ax
is
в€‚x Ax 12 2x 4x
= 2Ax = 2 x=
23 4x 6x
в€‚x
and the Hessian is
в€‚ 2 x Ax 12 24
= 2A = 2 = .
23 46
в€‚xв€‚x

2.5 Partitioned Matrices
Very often we will have to consider certain groups of rows and columns of a matrix A(n Г— p).
In the case of two groups, we have
A11 A12
A=
A21 A22
where Aij (ni Г— pj ), i, j = 1, 2, n1 + n2 = n and p1 + p2 = p.
2.5 Partitioned Matrices 69

If B(n Г— p) is partitioned accordingly, we have:

A11 + B11 A12 + B12
A+B =
A21 + B21 A22 + B22
B11 B21
B =
B12 B22
A11 B11 + A12 B12 A11 B21 + A12 B22
AB = .
A21 B11 + A22 B12 A21 B21 + A22 B22

An important particular case is the square matrix A(p Г— p), partitioned such that A11 and
A22 are both square matrices (i.e., nj = pj , j = 1, 2). It can be veriп¬Ѓed that when A is
non-singular (AAв€’1 = Ip ):
A11 A12
в€’1
A= (2.26)
A21 A22
where
def
пЈ±
(A11 в€’ A12 Aв€’1 A21 )в€’1 = (A11В·2 )в€’1
пЈґ A11 = 22
пЈґ
в€’(A11В·2 ) A12 Aв€’1
в€’1
пЈІ 12
A = 22
в€’1
в€’A22 A21 (A11В·2 )в€’1
пЈґ A21 =
пЈґ
Aв€’1 + Aв€’1 A21 (A11В·2 )в€’1 A12 Aв€’1
пЈі 22
A = .
22 22 22

An alternative expression can be obtained by reversing the positions of A11 and A22 in the
original matrix.
The following results will be useful if A11 is non-singular:

|A| = |A11 ||A22 в€’ A21 Aв€’1 A12 | = |A11 ||A22В·1 |. (2.27)
11

If A22 is non-singular, we have that:

|A| = |A22 ||A11 в€’ A12 Aв€’1 A21 | = |A22 ||A11В·2 |. (2.28)
22

A useful formula is derived from the alternative expressions for the inverse and the determi-
nant. For instance let
1b
B=
aA
where a and b are (p Г— 1) vectors and A is non-singular. We then have:

|B| = |A в€’ ab | = |A||1 в€’ b Aв€’1 a| (2.29)

and equating the two expressions for B 22 , we obtain the following:

Aв€’1 ab Aв€’1
в€’1 в€’1
(A в€’ ab ) =A + . (2.30)
1 в€’ b Aв€’1 a
70 2 A Short Excursion into Matrix Algebra

EXAMPLE 2.9 LetвЂ™s consider the matrix
12
A= .
22

We can use formula (2.26) to calculate the inverse of a partitioned matrix, i.e., A11 =
в€’1, A12 = A21 = 1, A22 = в€’1/2. The inverse of A is

в€’1 1
Aв€’1 = .
1 в€’0.5

It is also easy to calculate the determinant of A:

|A| = |1||2 в€’ 4| = в€’2.

Let A(n Г— p) and B(p Г— n) be any two matrices and suppose that n в‰Ґ p. From (2.27)
and (2.28) we can conclude that

в€’О»In в€’A
= (в€’О»)nв€’p |BA в€’ О»Ip | = |AB в€’ О»In |. (2.31)
B Ip

Since both determinants on the right-hand side of (2.31) are polynomials in О», we п¬Ѓnd that
the n eigenvalues of AB yield the p eigenvalues of BA plus the eigenvalue 0, n в€’ p times.
The relationship between the eigenvectors is described in the next theorem.

THEOREM 2.6 For A(n Г— p) and B(p Г— n), the non-zero eigenvalues of AB and BA are
the same and have the same multiplicity. If x is an eigenvector of AB for an eigenvalue
О» = 0, then y = Bx is an eigenvector of BA.

COROLLARY 2.2 For A(n Г— p), B(q Г— n), a(p Г— 1), and b(q Г— 1) we have

rank(Aab B) в‰¤ 1.

The non-zero eigenvalue, if it exists, equals b BAa (with eigenvector Aa).

Proof:
Theorem 2.6 asserts that the eigenvalues of Aab B are the same as those of b BAa. Note
that the matrix b BAa is a scalar and hence it is its own eigenvalue О»1 .
Applying Aab B to Aa yields

(Aab B)(Aa) = (Aa)(b BAa) = О»1 Aa.

2
2.6 Geometrical Aspects 71

В
ВЁ В§ВҐ
ВЁВЁ В¦
ВҐ
ВЁВЁВЁ 

ВҐ
В¤Вў Вў
ВЈ
ВҐ
ВўВў
ВҐ
ВўВў
ВҐ
ВўВў
ВўВўВҐ
Вў ВҐВў
ВЎ


Figure 2.1. Distance d.

2.6 Geometrical Aspects

Distance

Let x, y в€€ Rp . A distance d is deп¬Ѓned as a function
пЈ±
в€Ђx = y
пЈІ d(x, y) > 0
2p
d : R в†’ R+ which fulп¬Ѓlls d(x, y) = 0 if and only if x = y .
d(x, y) в‰¤ d(x, z) + d(z, y) в€Ђx, y, z
пЈі

A Euclidean distance d between two points x and y is deп¬Ѓned as
d2 (x, y) = (x в€’ y)T A(x в€’ y) (2.32)
where A is a positive deп¬Ѓnite matrix (A > 0). A is called a metric.

EXAMPLE 2.10 A particular case is when A = Ip , i.e.,
p
(xi в€’ yi )2 .
d2 (x, y) = (2.33)
i=1

Figure 2.1 illustrates this deп¬Ѓnition for p = 2.

Note that the sets Ed = {x в€€ Rp | (x в€’ x0 ) (x в€’ x0 ) = d2 } , i.e., the spheres with radius d
and center x0 , are the Euclidean Ip iso-distance curves from the point x0 (see Figure 2.2).
The more general distance (2.32) with a positive deп¬Ѓnite matrix A (A > 0) leads to the
iso-distance curves
Ed = {x в€€ Rp | (x в€’ x0 ) A(x в€’ x0 ) = d2 }, (2.34)
i.e., ellipsoids with center x0 , matrix A and constant d (see Figure 2.3).
О»2 в‰Ґ ... в‰Ґ О»p . The resulting observations are given in the next theorem.
Let Оі1 , Оі2 , ..., Оіp be the orthonormal eigenvectors of A corresponding to the eigenvalues О»1 в‰Ґ
Figure 2.3. IsoвЂ“distance ellipsoid.
ВЁ
ВЎ ВўВўВўВўВўВўВўВўВўВўВўВўВў ВҐВЈ
ВўВўВўВўВўВўВўВўВўВўВўВўВў ВҐ ВЈ
ВўВўВўВўВўВўВўВўВўВўВўВўВў
В§ВҐ ВҐ
В¦ ВЈ ВЈВ¤
ВўВўВўВўВўВўВўВўВўВўВўВўВў
В§
 В§

ВўВўВўВўВўВўВўВўВўВўВўВўВў
 ВўВўВўВўВўВўВў ВўВўВўВўВўВў ВўВўВўВў ВўВўВў ВўВўВў ВўВўВў ВўВў ВўВў ВўВў ВўВў ВўВў ВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў Вў Вў ВўВў Вў Вў ВўВў Вў ВўВў Вў Вў ВўВў Вў Вў Вў ВўВў Вў Вў Вў ВўВў Вў Вў Вў ВўВў Вў Вў Вў ВўВў Вў Вў ВўВў Вў Вў Вў ВўВў Вў Вў ВўВў Вў ВўВў ВўВў ВўВў Вў Вў ВўВў ВўВў ВўВў ВўВў Вў Вў ВўВў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў ВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВў ВўВў ВўВў ВўВў ВўВў ВўВўВў ВўВў ВўВў ВўВў ВўВўВў ВўВў ВўВў
Вў ВўВў
ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВўВў ВўВў ВўВўВў ВўВўВў ВўВў ВўВўВў ВўВўВў ВўВўВў ВўВўВў ВўВўВўВў ВўВўВў ВўВўВўВў ВўВўВў
ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВўВўВў ВўВўВўВў Вў ВўВў ВўВўВўВўВў ВўВўВўВў ВўВўВўВў Вў ВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВў ВўВўВўВў
ВўВў
ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў
ВўВўВўВўВўВўВўВўВўВўВў Вў
ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў
Вў
ВўВўВў ВўВў
В©ВЁ ВўВўВўВўВўВўВўВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў%ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў'Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў %ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў &Вў ВўВў Вў ВўВў Вў ВўВў ВўВўВўВўВўВўВўВўВў'ВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў &Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў %ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў Вў\$Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў %ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў \$Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў %ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў \$Вў Вў ВўВў ВўВў Вў ВўВў Вў ВўВў %ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў Вў В© ВЁ ВўВўВўВўВў 
ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў
ВўВў # ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў
ВўВўВў
ВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў Вў \$  ВҐ ВЈ
ВҐ ВЈ
ВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў
В¦
ВўВўВўВўВўВў ВўВўВўВўВўВўВўВў ВўВўВўВўВўВўВў Вў ВўВўВўВўВў ВўВўВўВўВўВў В§ВҐ 
ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў
!"  ВЈ
ВўВўВўВўВўВўВўВў Вў ВўВў ВўВўВўВў Вў ВўВў ВўВўВўВў Вў ВўВў ВўВўВў ВўВўВў ВўВўВўВў ВўВўВў ВўВўВў ВўВў ВўВўВў ВўВўВў Вў
Вў ВўВў ВЈ ВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВўВў
ВўВўВўВўВўВў ВўВўВў Вў ВўВў ВўВўВў Вў ВўВў ВўВўВў Вў ВўВў ВўВў Вў ВўВў ВўВў Вў ВўВў ВўВў ВўВў ВўВў ВўВў ВўВў ВўВў Вў ВўВў ВўВў ВўВў ВўВў Вў ВўВў ВўВў Вў ВўВў ВўВў Вў ВўВў ВўВў Вў ВўВў ВўВў Вў ВўВў Вў ВўВў ВўВў Вў Вў Вў ВўВў ВўВў Вў ВўВў Вў ВўВў ВўВў ВўВў Вў ВўВў ВўВў ВўВў ВўВў ВўВў ВўВўВў ВўВўВў ВўВўВў ВўВўВў ВўВўВўВўВЈВ¤ВўВўВўВў ВўВўВўВўВўВўВўВўВўВўВўВўВў
В
ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў Вў ВўВў ВўВў Вў Вў
Figure 2.2. IsoвЂ“distance sphere.
В¤Вў
ВЈ
ВЎ
В§
ВЁВ¦
ВЈВ¤Вў В¤Вў
ВҐ  
В
2 A Short Excursion into Matrix Algebra 72
2.6 Geometrical Aspects 73

THEOREM 2.7 (i) The principal axes of Ed are in the direction of Оіi ; i = 1, . . . , p.
d2
(ii) The half-lengths of the axes are ; i = 1, . . . , p.
О»i

(iii) The rectangle surrounding the ellipsoid Ed is deп¬Ѓned by the following inequalities:
в€љ в€љ
2 aii в‰¤ x в‰¤ x + d2 aii , i = 1, . . . , p,
x0i в€’ d i 0i

where aii is the (i, i) element of Aв€’1 . By the rectangle surrounding the ellipsoid Ed we
mean the rectangle whose sides are parallel to the coordinate axis.

It is easy to п¬Ѓnd the coordinates of the tangency points between the ellipsoid and its sur-
rounding rectangle parallel to the coordinate axes. Let us п¬Ѓnd the coordinates of the tangency
point that are in the direction of the j-th coordinate axis (positive direction).
For ease of notation, we suppose the ellipsoid is centered around the origin (x0 = 0). If not,
the rectangle will be shifted by the value of x0 .
The coordinate of the tangency point is given by the solution to the following problem:

x = arg max ej x (2.35)
x Ax=d2

where ej is the j-th column of the identity matrix Ip . The coordinate of the tangency point
in the negative direction would correspond to the solution of the min problem: by symmetry,
it is the opposite value of the former.
The solution is computed via the Lagrangian L = ej x в€’ О»(x Ax в€’ d2 ) which by (2.23) leads
to the following system of equations:
в€‚L
= ej в€’ 2О»Ax = 0 (2.36)
в€‚x
в€‚L
= xT Ax в€’ d2 = 0. (2.37)
в€‚О»
Aв€’1 ej ,
1
This gives x = or componentwise
2О»

1 ij
xi = a , i = 1, . . . , p (2.38)
2О»
where aij denotes the (i, j)-th element of Aв€’1 .
Premultiplying (2.36) by x , we have from (2.37):

xj = 2О»d2 .
jj
Comparing this to the value obtained by (2.38), for i = j we obtain 2О» = a 2 . We choose
d
the positive value of the square root because we are maximizing ej x. A minimum would
74 2 A Short Excursion into Matrix Algebra

correspond to the negative value. Finally, we have the coordinates of the tangency point
between the ellipsoid and its surrounding rectangle in the positive direction of the j-th axis:

d2 ij
xi = a , i = 1, . . . , p. (2.39)
ajj

The particular case where i = j provides statement (iii) in Theorem 2.7.

Remark: usefulness of Theorem 2.7

Theorem 2.7 will prove to be particularly useful in many subsequent chapters. First, it
provides a helpful tool for graphing an ellipse in two dimensions. Indeed, knowing the slope
of the principal axes of the ellipse, their half-lengths and drawing the rectangle inscribing
the ellipse allows one to quickly draw a rough picture of the shape of the ellipse.
In Chapter 7, it is shown that the conп¬Ѓdence region for the vector Вµ of a multivariate
normal population is given by a particular ellipsoid whose parameters depend on sample
characteristics. The rectangle inscribing the ellipsoid (which is much easier to obtain) will
provide the simultaneous conп¬Ѓdence intervals for all of the components in Вµ.
In addition it will be shown that the contour surfaces of the multivariate normal density
are provided by ellipsoids whose parameters depend on the mean vector and on the covari-
ance matrix. We will see that the tangency points between the contour ellipsoids and the
surrounding rectangle are determined by regressing one component on the (p в€’ 1) other
components. For instance, in the direction of the j-th axis, the tangency points are given
by the intersections of the ellipsoid contours with the regression line of the vector of (p в€’ 1)
variables (all components except the j-th) on the j-th component.

Norm of a Vector

Consider a vector x в€€ Rp . The norm or length of x (with respect to the metric Ip ) is deп¬Ѓned
as
в€љ
x = d(0, x) = x x.

If x = 1, x is called a unit vector. A more general norm can be deп¬Ѓned with respect to the
metric A:
в€љ
x A = x Ax.
2.6 Geometrical Aspects 75

В
ВЁВ¦
ВҐ 
ВҐ
В¤Вў Вў
ВЈ !
!!!!!!! ! !!!!!! !!!!!! !!!! !!!! !!!! !!!! !!! !!! ! ВҐ
ВўВў
Вў Вў !!!!!!!!!!!!!!! !!!!!!!!! !!!!! !!!!! !!!! !!!! !!! !!!! ! ВҐ
Вў!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Вў !!!!!!!!!!!!!!!!!!!!! !!!!!!!!!! ! ВҐ
  !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! Вў

  ВҐ
!!!!!!!!!!!!!!!!! !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!  Вў Вў Вў ВҐ
!!!!!!!!!!!!!!!!!!!!!! 
!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!! Вў ВўВҐ
!!!!!!!!!!!
!!!!!!!!!!! ВЎ

Figure 2.4. Angle between vectors.

Angle between two Vectors

Consider two vectors x and y в€€ Rp . The angle Оё between x and y is deп¬Ѓned by the cosine of
Оё:
xy
cos Оё = , (2.40)
xy
x1 y1
see Figure 2.4. Indeed for p = 2, x = and y = , we have
x2 y2

x cos Оё1 = x1 ; y cos Оё2 = y1
(2.41)
x sin Оё1 = x2 ; y sin Оё2 = y2 ,

therefore,
x1 y1 + x2 y2 xy
cos Оё = cos Оё1 cos Оё2 + sin Оё1 sin Оё2 = = .
xy xy

ПЂ
REMARK 2.1 If x y = 0, then the angle Оё is equal to . From trigonometry, we know that
2
the cosine of Оё equals the length of the base of a triangle (||px ||) divided by the length of the
hypotenuse (||x||). Hence, we have

|x y|
||px || = ||x||| cos Оё| = , (2.42)
y
76 2 A Short Excursion into Matrix Algebra

В¦ ВЈВЎ ВўВЎ ВЎ
ВЎВЎ
ВЁ ВЎВЎ
В¤В  ВҐ В§В


Figure 2.5. Projection.

where px is the projection of x on y (which is deп¬Ѓned below). It is the coordinate of x on the
y vector, see Figure 2.5.
The angle can also be deп¬Ѓned with respect to a general metric A
x Ay
cos Оё = . (2.43)
xA y A

If cos Оё = 0 then x is orthogonal to y with respect to the metric A.

EXAMPLE 2.11 Assume that there are two centered (i.e., zero mean) data vectors. The
cosine of the angle between them is equal to their correlation (deп¬Ѓned in (3.8))! Indeed for
x and y with x = y = 0 we have
x i yi
rXY = = cos Оё
x2 2
yi
i

according to formula (2.40).

Rotations

When we consider a point x в€€ Rp , we generally use a p-coordinate system to obtain its geo-
metric representation, like in Figure 2.1 for instance. There will be situations in multivariate
techniques where we will want to rotate this system of coordinates by the angle Оё.
Consider for example the point P with coordinates x = (x1 , x2 ) in R2 with respect to a
given set of orthogonal axes. Let О“ be a (2 Г— 2) orthogonal matrix where

cos Оё sin Оё
О“= . (2.44)
в€’ sin Оё cos Оё

If the axes are rotated about the origin through an angle Оё in a clockwise direction, the new
coordinates of P will be given by the vector y

y = О“ x, (2.45)
2.6 Geometrical Aspects 77

and a rotation through the same angle in a counterclockwise direction gives the new coordi-
nates as
y = О“ x. (2.46)

More generally, premultiplying a vector x by an orthogonal matrix О“ geometrically corre-
sponds to a rotation of the system of axes, so that the п¬Ѓrst new axis is determined by the
п¬Ѓrst row of О“. This geometric point of view will be exploited in Chapters 9 and 10.

Column Space and Null Space of a Matrix

Deп¬Ѓne for X (n Г— p)
def
Im(X ) = C(X ) = {x в€€ Rn | в€ѓa в€€ Rp so that X a = x},
the space generated by the columns of X or the column space of X . Note that C(X ) вЉ† Rn
and dim{C(X )} = rank(X ) = r в‰¤ min(n, p).
def
Ker(X ) = N (X ) = {y в€€ Rp | X y = 0}
is the null space of X . Note that N (X ) вЉ† Rp and that dim{N (X )} = p в€’ r.

REMARK 2.2 N (X ) is the orthogonal complement of C(X ) in Rn , i.e., given a vector
b в€€ Rn it will hold that x b = 0 for all x в€€ C(X ), if and only if b в€€ N (X ).
пЈ« пЈ¶
235
пЈ¬4 6 7пЈ·
EXAMPLE 2.12 Let X = пЈ¬ пЈ­ 6 8 6 пЈё . It is easy to show (e.g. by calculating the de-
пЈ·

824
terminant of X ) that rank(X ) = 3. Hence, the columns space of X is C(X ) = R3 .
The null space of X contains only the zero vector (0, 0, 0) and its dimension is equal to
rank(X ) в€’ 3 = 0.
пЈ« пЈ¶
231
пЈ¬4 6 2пЈ·
For X = пЈ¬ пЈ­ 6 8 3 пЈё , the third column is a multiple of the п¬Ѓrst one and the matrix X
пЈ·

824
cannot be of full rank. Noticing that the п¬Ѓrst two columns of X are independent, we see that
rank(X ) = 2. In this case, the dimension of the columns space is 2 and the dimension of the
null space is 1.

Projection Matrix

A matrix P(n Г— n) is called an (orthogonal) projection matrix in Rn if and only if P = P =
P 2 (P is idempotent). Let b в€€ Rn . Then a = Pb is the projection of b on C(P).
78 2 A Short Excursion into Matrix Algebra

Projection on C(X )

Consider X (n Г— p) and let
P = X (X X )в€’1 X (2.47)
and Q = In в€’ P. ItвЂ™s easy to check that P and Q are idempotent and that

PX = X and QX = 0. (2.48)

Since the columns of X are projected onto themselves, the projection matrix P projects any
vector b в€€ Rn onto C(X ). Similarly, the projection matrix Q projects any vector b в€€ Rn
onto the orthogonal complement of C(X ).

THEOREM 2.8 Let P be the projection (2.47) and Q its orthogonal complement. Then:

(i) x = Pb в‡’ x в€€ C(X ),

(ii) y = Qb в‡’ y x = 0 в€Ђx в€€ C(X ).

Proof:
(i) holds, since x = X (X X )в€’1 X b = X a, where a = (X X )в€’1 X b в€€ Rp .
(ii) follows from y = b в€’ Pb and x = X a в‡’ y x = b X a в€’ b X (X X )в€’1 X X a = 0. 2

REMARK 2.3 Let x, y в€€ Rn and consider px в€€ Rn , the projection of x on y (see Figure
2.5). With X = y we have from (2.47)

yx
px = y(y y)в€’1 y x = y (2.49)
y2

and we can easily verify that
|y x|
px = px px = .
y
See again Remark 2.1.
2.7 Exercises 79

Summary
в†’ A distance between two p-dimensional points x and y is a quadratic form
(x в€’ y) A(x в€’ y) in the vectors of diп¬Ђerences (x в€’ y). A distance deп¬Ѓnes
the norm of a vector.
в†’ Iso-distance curves of a point x0 are all those points that have the same
distance from x0 . Iso-distance curves are ellipsoids whose principal axes
are determined by the direction of the eigenvectors of A. The half-length of
principal axes is proportional to the inverse of the roots of the eigenvalues
of A.
в†’ The angle between two vectors x and y is given by cos Оё = x Ay
w.r.t.
x Ay A
the metric A.
в†’ For the Euclidean distance with A = I the correlation between two cen-
tered data vectors x and y is given by the cosine of the angle between
them, i.e., cos Оё = rXY .
в†’ The projection P = X (X X )в€’1 X is the projection onto the column
space C(X ) of X .
в†’ The projection of x в€€ Rn on y в€€ Rn is given by p = yx
y.
x y2

2.7 Exercises
EXERCISE 2.1 Compute the determinant for a (3 Г— 3) matrix.

EXERCISE 2.2 Suppose that |A| = 0. Is it possible that all eigenvalues of A are positive?

EXERCISE 2.3 Suppose that all eigenvalues of some (square) matrix A are diп¬Ђerent from
zero. Does the inverse Aв€’1 of A exist?

EXERCISE 2.4 Write a program that calculates the Jordan decomposition of the matrix
пЈ« пЈ¶
12 3
A=пЈ­ 2 1 2 пЈё.
32 1

Check Theorem 2.1 numerically.
80 2 A Short Excursion into Matrix Algebra

EXERCISE 2.5 Prove (2.23), (2.24) and (2.25).

EXERCISE 2.6 Show that a projection matrix only has eigenvalues in {0, 1}.

EXERCISE 2.7 Draw some iso-distance ellipsoids for the metric A = ОЈв€’1 of Example 3.13.

EXERCISE 2.8 Find a formula for |A + aa | and for (A + aa )в€’1 . (Hint: use the inverse
1 в€’a
partitioned matrix with B = .)
aA

EXERCISE 2.9 Prove the Binomial inverse theorem for two non-singular matrices A(p Г— p)
and B(p Г— p): (A + B)в€’1 = Aв€’1 в€’ Aв€’1 (Aв€’1 + B в€’1 )в€’1 Aв€’1 . (Hint: use (2.26) with C =
A Ip
.)
в€’Ip B в€’1
3 Moving to Higher Dimensions

We have seen in the previous chapters how very simple graphical devices can help in under-
standing the structure and dependency of data. The graphical tools were based on either
univariate (bivariate) data representations or on вЂњslickвЂќ transformations of multivariate infor-
mation perceivable by the human eye. Most of the tools are extremely useful in a modelling
step, but unfortunately, do not give the full picture of the data set. One reason for this is
that the graphical tools presented capture only certain dimensions of the data and do not
necessarily concentrate on those dimensions or subparts of the data under analysis that carry
the maximum structural information. In Part III of this book, powerful tools for reducing
the dimension of a data set will be presented. In this chapter, as a starting point, simple and
basic tools are used to describe dependency. They are constructed from elementary facts of
probability theory and introductory statistics (for example, the covariance and correlation
between two variables).
Sections 3.1 and 3.2 show how to handle these concepts in a multivariate setup and how a
simple test on correlation between two variables can be derived. Since linear relationships
are involved in these measures, Section 3.4 presents the simple linear model for two variables
and recalls the basic t-test for the slope. In Section 3.5, a simple example of one-factorial
analysis of variance introduces the notations for the well known F -test.
Due to the power of matrix notation, all of this can easily be extended to a more general
multivariate setup. Section 3.3 shows how matrix operations can be used to deп¬Ѓne summary
statistics of a data set and for obtaining the empirical moments of linear transformations of
the data. These results will prove to be very useful in most of the chapters in Part III.
Finally, matrix notation allows us to introduce the п¬‚exible multiple linear model, where more
general relationships among variables can be analyzed. In Section 3.6, the least squares
adjustment of the model and the usual test statistics are presented with their geometric
interpretation. Using these notations, the ANOVA model is just a particular case of the
multiple linear model.
82 3 Moving to Higher Dimensions

3.1 Covariance
Covariance is a measure of dependency between random variables. Given two (random)
variables X and Y the (theoretical) covariance is deп¬Ѓned by:

ПѓXY = Cov (X, Y ) = E(XY ) в€’ (EX)(EY ). (3.1)

The precise deп¬Ѓnition of expected values is given in Chapter 4. If X and Y are independent
of each other, the covariance Cov (X, Y ) is necessarily equal to zero, see Theorem 3.1. The
converse is not true. The covariance of X with itself is the variance:

ПѓXX = Var (X) = Cov (X, X).
пЈ« пЈ¶
X1
If the variable X is p-dimensional multivariate, e.g., X = пЈ­ . пЈё, then the theoretical
пЈ¬.пЈ·
.
Xp
covariances among all the elements are put into matrix form, i.e., the covariance matrix:
пЈ« пЈ¶
ПѓX1 X1 . . . ПѓX1 Xp
. .
...
. .
ОЈ=пЈ­ пЈё.
пЈ¬ пЈ·
. .
ПѓXp X1 . . . ПѓXp Xp

Properties of covariance matrices will be detailed in Chapter 4. Empirical versions of these
quantities are:
n
1
(xi в€’ x)(yi в€’ y)
sXY = (3.2)
n i=1
n
1
(xi в€’ x)2 .
sXX = (3.3)
n i=1

1 1
For small n, say n в‰¤ 20, we should replace the factor n in (3.2) and (3.3) by nв€’1 in order
to correct for a small bias. For a p-dimensional random variable, one obtains the empirical
covariance matrix (see Section 3.3 for properties and details)
пЈ« пЈ¶
sX1 X1 . . . sX1 Xp
S=пЈ­ . . пЈ·.
...
пЈ¬. .пЈё
. .
sXp X1 . . . sXp Xp

For a scatterplot of two variables the covariances measure вЂњhow close the scatter is to a
lineвЂќ. Mathematical details follow but it should already be understood here that in this
sense covariance measures only вЂњlinear dependenceвЂќ.
3.1 Covariance 83

EXAMPLE 3.1 If X is the entire bank data set, one obtains the covariance matrix S as
indicated below:
пЈ« пЈ¶
0.02 в€’0.10 в€’0.01
0.14 0.03 0.08
0.10 в€’0.21 пЈ·
пЈ¬ 0.03 0.12 0.10 0.21
пЈ¬ пЈ·
0.12 в€’0.24 пЈ·
пЈ¬ 0.02 0.10 0.16 0.28
S=пЈ¬ пЈ·. (3.4)
пЈ¬ в€’0.10 0.16 в€’1.03 пЈ·
0.21 0.28 2.07
пЈ¬ пЈ·
пЈ­ в€’0.01 0.64 в€’0.54 пЈё
0.10 0.12 0.16
0.08 в€’0.21 в€’0.24 в€’1.03 в€’0.54 1.32
The empirical covariance between X4 and X5 , i.e., sX4 X5 , is found in row 4 and column 5.
The value is sX4 X5 = 0.16. Is it obvious that this value is positive? In Exercise 3.1 we will
discuss this question further.
If Xf denotes the counterfeit bank notes, we obtain:
пЈ« пЈ¶
0.023 в€’0.099
0.123 0.031 0.019 0.011
0.046 в€’0.024 в€’0.012 в€’0.005
пЈ¬ 0.031 0.064 пЈ·
пЈ¬ пЈ·
0.088 в€’0.018
пЈ¬ 0.024 0.046 0.000 0.034 пЈ·
Sf = пЈ¬ пЈ·В· (3.5)
пЈ¬ в€’0.099 в€’0.024 в€’0.018 1.268 в€’0.485 0.236 пЈ·
пЈ¬ пЈ·
пЈ­ 0.019 в€’0.012 0.000 в€’0.485 0.400 в€’0.022 пЈё
0.011 в€’0.005 0.236 в€’0.022
0.034 0.308
For the genuine, Xg , we have:
пЈ« пЈ¶
0.149 0.057 0.057 0.056 0.014 0.005
в€’0.043
пЈ¬ 0.057 0.131 0.085 0.056 0.048 пЈ·
пЈ¬ пЈ·
в€’0.024
пЈ¬ 0.057 0.085 0.125 0.058 0.030 пЈ·
Sg = пЈ¬ пЈ·В· (3.6)
0.409 в€’0.261 в€’0.000
пЈ¬ 0.056 0.056 0.058 пЈ·
пЈ¬ пЈ·
0.030 в€’0.261 в€’0.074
пЈ­ 0.014 0.049 0.417 пЈё
0.005 в€’0.043 в€’0.024 в€’0.000 в€’0.074 0.198

Note that the covariance between X4 (distance of the frame to the lower border) and X5
(distance of the frame to the upper border) is negative in both (3.5) and (3.6)! Why would
this happen? In Exercise 3.2 we will discuss this question in more detail.
At п¬Ѓrst sight, the matrices Sf and Sg look diп¬Ђerent, but they create almost the same scatter-
plots (see the discussion in Section 1.4). Similarly, the common principal component analysis
in Chapter 9 suggests a joint analysis of the covariance structure as in Flury and Riedwyl
(1988).
Scatterplots with point clouds that are вЂњupward-slopingвЂќ, like the one in the upper left of
Figure 1.14, show variables with positive covariance. Scatterplots with вЂњdownward-slopingвЂќ
structure have negative covariance. In Figure 3.1 we show the scatterplot of X4 vs. X5 of
the entire bank data set. The point cloud is upward-sloping. However, the two sub-clouds
of counterfeit and genuine bank notes are downward-sloping.
84 3 Moving to Higher Dimensions

Swiss bank notes

12
11
X_5
10
9
8

8 9 10 11 12
X_4

Figure 3.1. Scatterplot of variables X4 vs. X5 of the entire bank data
set. MVAscabank45.xpl

EXAMPLE 3.2 A textile shop manager is studying the sales of вЂњclassic blueвЂќ pullovers over
10 diп¬Ђerent periods. He observes the number of pullovers sold (X1 ), variation in price (X2 ,
in EUR), the advertisement costs in local newspapers (X3 , in EUR) and the presence of a
sales assistant (X4 , in hours per period). Over the periods, he observes the following data
matrix:
пЈ« пЈ¶
230 125 200 109
пЈ¬ 181 99 55 107 пЈ·
пЈ¬ пЈ·
пЈ¬ 165 97 105 98 пЈ·
пЈ¬ пЈ·
пЈ¬ 150 115 85 71 пЈ·
пЈ¬ пЈ·
пЈ¬ 97 120 0 82 пЈ·
X =пЈ¬ пЈ¬ 192 100 150 103 пЈ· .
пЈ·
пЈ¬ пЈ·
пЈ¬ 181 80 85 111 пЈ·
пЈ¬ пЈ·
пЈ¬ 189 90 120 93 пЈ·
пЈ¬ пЈ·
пЈ­ 172 95 110 86 пЈё
170 125 130 78
3.1 Covariance 85

pullovers data
200
sales (x1)
150 100

80 90 100 110 120
price (X2)

Figure 3.2. Scatterplot of variables X2 vs. X1 of the pullovers data set.
MVAscapull1.xpl

He is convinced that the price must have a large inп¬‚uence on the number of pullovers sold.
So he makes a scatterplot of X2 vs. X1 , see Figure 3.2. A rough impression is that the cloud
is somewhat downward-sloping. A computation of the empirical covariance yields
10
1 ВЇ ВЇ
X1i в€’ X1 X2i в€’ X2 = в€’80.02,
sX1 X2 =
9 i=1

a negative value as expected.
Note: The covariance function is scale dependent. Thus, if the prices in this example were
in Japanese Yen (JPY), we would obtain a diп¬Ђerent answer (see Exercise 3.16). A measure
of (linear) dependence independent of the scale is the correlation, which we introduce in the
next section.
86 3 Moving to Higher Dimensions

Summary
в†’ The covariance is a measure of dependence.
в†’ Covariance measures only linear dependence.
в†’ Covariance is scale dependent.
в†’ There are nonlinear dependencies that have zero covariance.
в†’ Zero covariance does not imply independence.
в†’ Independence implies zero covariance.
в†’ Negative covariance corresponds to downward-sloping scatterplots.
в†’ Positive covariance corresponds to upward-sloping scatterplots.
в†’ The covariance of a variable with itself is its variance Cov (X, X) = ПѓXX =
2
ПѓX .
1
в†’ For small n, we should replace the factor in the computation of the
n
1
covariance by nв€’1 .

3.2 Correlation
The correlation between two variables X and Y is deп¬Ѓned from the covariance as the follow-
ing:
Cov (X, Y )
В·
ПЃXY = (3.7)
Var (X) Var (Y )
The advantage of the correlation is that it is independent of the scale, i.e., changing the
variablesвЂ™ scale of measurement does not change the value of the correlation. Therefore, the
correlation is more useful as a measure of association between two random variables than
the covariance. The empirical version of ПЃXY is as follows:
sXY
В·
rXY = в€љ (3.8)
sXX sY Y

The correlation is in absolute value always less than 1. It is zero if the covariance is zero
and vice-versa. For p-dimensional vectors (X1 , . . . , Xp ) we have the theoretical correlation
matrix пЈ« пЈ¶
ПЃX1 X1 . . . ПЃX1 Xp
P=пЈ­ . . пЈ·,
...
пЈ¬. .пЈё
. .
ПЃXp X1 . . . ПЃXp Xp
3.2 Correlation 87

and its empirical version, the empirical correlation matrix which can be calculated from the
observations, пЈ« пЈ¶
rX1 X1 . . . rX1 Xp
R=пЈ­ . . пЈ·.
..
пЈ¬. .пЈё
.
. .
rXp X1 . . . rXp Xp

EXAMPLE 3.3 We obtain the following correlation matrix for the genuine bank notes:
пЈ« пЈ¶
1.00 0.41 0.41 0.22 0.05 0.03
0.20 в€’0.25 пЈ·
пЈ¬ 0.41 1.00 0.66 0.24
пЈ¬ пЈ·
0.13 в€’0.14 пЈ·
пЈ¬ 0.41 0.66 1.00 0.25
Rg = пЈ¬ пЈ·, (3.9)
1.00 в€’0.63 в€’0.00 пЈ·
пЈ¬ 0.22 0.24 0.25
пЈ¬ пЈ·
0.13 в€’0.63 1.00 в€’0.25 пЈё
пЈ­ 0.05 0.20
0.03 в€’0.25 в€’0.14 в€’0.00 в€’0.25 1.00
and for the counterfeit bank notes:
пЈ« пЈ¶
0.24 в€’0.25
1.00 0.35 0.08 0.06
0.61 в€’0.08 в€’0.07 в€’0.03 пЈ·
пЈ¬ 0.35 1.00
пЈ¬ пЈ·
1.00 в€’0.05
пЈ¬ 0.24 0.61 0.00 0.20 пЈ·
Rf = пЈ¬ пЈ·. (3.10)
пЈ¬ в€’0.25 в€’0.08 в€’0.05 1.00 в€’0.68 0.37 пЈ·
пЈ¬ пЈ·
пЈ­ 0.08 в€’0.07 0.00 в€’0.68 1.00 в€’0.06 пЈё
0.06 в€’0.03 0.37 в€’0.06
0.20 1.00
As noted before for Cov (X4 , X5 ), the correlation between X4 (distance of the frame to the
lower border) and X5 (distance of the frame to the upper border) is negative. This is natural,
since the covariance and correlation always have the same sign (see also Exercise 3.17).

Why is the correlation an interesting statistic to study? It is related to independence of
random variables, which we shall deп¬Ѓne more formally later on. For the moment we may
think of independence as the fact that one variable has no inп¬‚uence on another.

THEOREM 3.1 If X and Y are independent, then ПЃ(X, Y ) = Cov (X, Y ) = 0.
ВЎe
!
ВЎe
e In general, the converse is not true, as the following example shows.
ВЎ

EXAMPLE 3.4 Consider a standard normally-distributed random variable X and a random
variable Y = X 2 , which is surely not independent of X. Here we have
Cov (X, Y ) = E(XY ) в€’ E(X)E(Y ) = E(X 3 ) = 0
(because E(X) = 0 and E(X 2 ) = 1). Therefore ПЃ(X, Y ) = 0, as well. This example
also shows that correlations and covariances measure only linear dependence. The quadratic
dependence of Y = X 2 on X is not reп¬‚ected by these measures of dependence.
88 3 Moving to Higher Dimensions

REMARK 3.1 For two normal random variables, the converse of Theorem 3.1 is true: zero
covariance for two normally-distributed random variables implies independence. This will be
shown later in Corollary 5.2.

Theorem 3.1 enables us to check for independence between the components of a bivariate
normal random variable. That is, we can use the correlation and test whether it is zero. The
distribution of rXY for an arbitrary (X, Y ) is unfortunately complicated. The distribution
of rXY will be more accessible if (X, Y ) are jointly normal (see Chapter 5). If we transform
the correlation by FisherвЂ™s Z-transformation,

1 1 + rXY
W= log , (3.11)
1 в€’ rXY
2

we obtain a variable that has a more accessible distribution. Under the hypothesis that
ПЃ = 0, W has an asymptotic normal distribution. Approximations of the expectation and
variance of W are given by the following:

1+ПЃXY
1
E(W ) в‰€ log
2 1в€’ПЃXY
(3.12)
1
Var (W ) в‰€ В·
(nв€’3)

The distribution is given in Theorem 3.2.

THEOREM 3.2
W в€’ E(W ) L
в€’в†’ N (0, 1).
Z= (3.13)
Var (W )

L
The symbol вЂњв€’в†’вЂќ denotes convergence in distribution, which will be explained in more
detail in Chapter 4.
Theorem 3.2 allows us to test diп¬Ђerent hypotheses on correlation. We can п¬Ѓx the level of
signiп¬Ѓcance О± (the probability of rejecting a true hypothesis) and reject the hypothesis if the
diп¬Ђerence between the hypothetical value and the calculated value of Z is greater than the
corresponding critical value of the normal distribution. The following example illustrates
the procedure.

EXAMPLE 3.5 LetвЂ™s study the correlation between mileage (X2 ) and weight (X8 ) for the
car data set (B.3) where n = 74. We have rX2 X8 = в€’0.823. Our conclusions from the
boxplot in Figure 1.3 (вЂњJapanese cars generally have better mileage than the othersвЂќ) needs
to be revised. From Figure 3.3 and rX2 X8 , we can see that mileage is highly correlated with
weight, and that the Japanese cars in the sample are in fact all lighter than the others!
3.2 Correlation 89

If we want to know whether ПЃX2 X8 is signiп¬Ѓcantly diп¬Ђerent from ПЃ0 = 0, we apply FisherвЂ™s
Z-transform (3.11). This gives us

в€’1.166 в€’ 0
1 1 + rX2 X8
= в€’1.166 = в€’9.825,
w= log and z=
1 в€’ rX2 X8
2 1
71

i.e., a highly signiп¬Ѓcant value to reject the hypothesis that ПЃ = 0 (the 2.5% and 97.5%
quantiles of the normal distribution are в€’1.96 and 1.96, respectively). If we want to test the
hypothesis that, say, ПЃ0 = в€’0.75, we obtain:

в€’1.166 в€’ (в€’0.973)
= в€’1.627.
z=
1
71

This is a nonsigniп¬Ѓcant value at the О± = 0.05 level for z since it is between the critical values
at the 5% signiп¬Ѓcance level (i.e., в€’1.96 < z < 1.96).

EXAMPLE 3.6 Let us consider again the pullovers data set from example 3.2. Consider the
correlation between the presence of the sales assistants (X4 ) vs. the number of sold pullovers
(X1 ) (see Figure 3.4). Here we compute the correlation as

rX1 X4 = 0.633.

The Z-transform of this value is

1 1 + rX1 X4
w= loge = 0.746. (3.14)
1 в€’ rX1 X4
2

The sample size is n = 10, so for the hypothesis ПЃX1 X4 = 0, the statistic to consider is:
в€љ
z = 7(0.746 в€’ 0) = 1.974 (3.15)

which is just statistically signiп¬Ѓcant at the 5% level (i.e., 1.974 is just a little larger than
1.96).

REMARK 3.2 The normalizing and variance stabilizing properties of W are asymptotic. In
addition the use of W in small samples (for n в‰¤ 25) is improved by HotellingвЂ™s transform
(Hotelling, 1953):

3W + tanh(W ) 1
Wв€— = W в€’ V ar(W в€— ) =
with .
4(n в€’ 1) nв€’1

The transformed variable W в€— is asymptotically distributed as a normal distribution.
90 3 Moving to Higher Dimensions

car data

30
25
1500+weight (X8)*E2
20
15
10
5

15 20 25 30 35 40
mileage (X2)

Figure 3.3. Mileage (X2 ) vs. weight (X8 ) of U.S. (star), European (plus
signs) and Japanese (circle) cars. MVAscacar.xpl
в€љ
EXAMPLE 3.7 From the preceding remark, we obtain wв€— = 0.6663 and 10 в€’ 1wв€— = 1.9989
for the preceding Example 3.6. This value is signiп¬Ѓcant at the 5% level.

REMARK 3.3 Note that the FisherвЂ™s Z-transform is the inverse of the hyperbolic tangent
2W
function: W = tanhв€’1 (rXY ); equivalently rXY = tanh(W ) = e2W в€’1 .
e +1

REMARK 3.4 Under the assumptions of normality of X and Y , we may test their indepen-
dence (ПЃXY = 0) using the exact t-distribution of the statistic

nв€’2 ПЃXY =0
в€ј
T = rXY tnв€’2 .
2
1 в€’ rXY
Setting the probability of the п¬Ѓrst error type to О±, we reject the null hypothesis ПЃXY = 0 if
|T | в‰Ґ t1в€’О±/2;nв€’2 .
3.2 Correlation 91

pullovers data
200
sales (X1)
150 100

80 90 100 110
sales assistants (X4)

Figure 3.4. Hours of sales assistants (X4 ) vs. sales (X1 ) of pullovers.
MVAscapull2.xpl

Summary
в†’ The correlation is a standardized measure of dependence.
в†’ The absolute value of the correlation is always less than one.
в†’ Correlation measures only linear dependence.
в†’ There are nonlinear dependencies that have zero correlation.
в†’ Zero correlation does not imply independence.
в†’ Independence implies zero correlation.
в†’ Negative correlation corresponds to downward-sloping scatterplots.
в†’ Positive correlation corresponds to upward-sloping scatterplots.
92 3 Moving to Higher Dimensions

Summary (continued)
в†’ FisherвЂ™s Z-transform helps us in testing hypotheses on correlation.
в†’ For small samples, FisherвЂ™s Z-transform can be improved by the transfor-
mation W в€— = W в€’ 3W 4(nв€’1) ) .
+tanh(W

3.3 Summary Statistics
This section focuses on the representation of basic summary statistics (means, covariances
and correlations) in matrix notation, since we often apply linear transformations to data.
The matrix notation allows us to derive instantaneously the corresponding characteristics of
the transformed variables. The Mahalanobis transformation is a prominent example of such
linear transformations.
Assume that we have observed n realizations of a p-dimensional random variable; we have a
data matrix X (n Г— p):
x11 В· В· В· x1p
пЈ« пЈ¶
пЈ¬. .пЈ·
пЈ¬. .пЈ·
. .
X =пЈ¬ . . пЈ·. (3.16)
пЈ­. .пЈё
. .
xn1 В· В· В· xnp
The rows xi = (xi1 , . . . , xip ) в€€ Rp denote the i-th observation of a p-dimensional random
variable X в€€ Rp .
The statistics that were brieп¬‚y introduced in Section 3.1 and 3.2 can be rewritten in matrix
form as follows. The вЂњcenter of gravityвЂќ of the n observations in Rp is given by the vector x
of the means xj of the p variables:
пЈ« пЈ¶
x1
x = пЈ­ . пЈё = nв€’1 X 1n .
пЈ¬.пЈ·
(3.17)
.
xp

The dispersion of the n observations can be characterized by the covariance matrix of the
p variables. The empirical covariances deп¬Ѓned in (3.2) and (3.3) are the elements of the
following matrix:
S = nв€’1 X X в€’ x x = nв€’1 (X X в€’ nв€’1 X 1n 1n X ). (3.18)
Note that this matrix is equivalently deп¬Ѓned by
n
1
S= (xi в€’ x)(xi в€’ x) .
n i=1
3.3 Summary Statistics 93

The covariance formula (3.18) can be rewritten as S = nв€’1 X HX with the centering matrix

H = In в€’ nв€’1 1n 1n . (3.19)
Note that the centering matrix is symmetric and idempotent. Indeed,
H2 = (In в€’ nв€’1 1n 1n )(In в€’ nв€’1 1n 1n )
= In в€’ nв€’1 1n 1n в€’ nв€’1 1n 1n + (nв€’1 1n 1n )(nв€’1 1n 1n )
= In в€’ nв€’1 1n 1n = H.
As a consequence S is positive semideп¬Ѓnite, i.e.
S в‰Ґ 0. (3.20)
Indeed for all a в€€ Rp ,
 << стр. 3(всего 4)СОДЕРЖАНИЕ >>