<<

. 4
( 4)



a Sa = n’1 a X HX a
= n’1 (a X H )(HX a) since H H = H,
p
= n’1 y y = n’1 2
yj ≥ 0
j=1

for y = HX a. It is well known from the one-dimensional case that n’1 n (xi ’ x)2
i=1
’1
as an estimate of the variance exhibits a bias of the order n (Breiman, 1973). In the
n
multidimensional case, Su = n’1 S is an unbiased estimate of the true covariance. (This will
be shown in Example 4.15.)
The sample correlation coe¬cient between the i-th and j-th variables is rXi Xj , see (3.8). If
D = diag(sXi Xi ), then the correlation matrix is
R = D’1/2 SD’1/2 , (3.21)
where D’1/2 is a diagonal matrix with elements (sXi Xi )’1/2 on its main diagonal.

EXAMPLE 3.8 The empirical covariances are calculated for the pullover data set.
The vector of the means of the four variables in the dataset is x = (172.7, 104.6, 104.0, 93.8) .
« 
1037.2 ’80.2 1430.7 271.4
¬ ’80.2 219.8 92.1 ’91.6 ·
The sample covariance matrix is S =  ·.
¬
1430.7 92.1 2624 210.3 
271.4 ’91.6 210.3 177.4
The unbiased estimate of the variance (n =10) is equal to
« 
1152.5 ’88.9 1589.7 301.6
10 ¬ ’88.9 244.3 102.3 ’101.8 ·
Su = S =  ·.
¬
1589.7 102.3 2915.6 233.7 
9
301.6 ’101.8 233.7 197.1
94 3 Moving to Higher Dimensions

« 
’0.17
1 0.87 0.63
¬ ’0.17 0.12 ’0.46 ·
1
The sample correlation matrix is R = ¬ ·.
 0.87 0.12 1 0.31 
0.63 ’0.46 0.31 1


Linear Transformation

In many practical applications we need to study linear transformations of the original data.
This motivates the question of how to calculate summary statistics after such linear trans-
formations.
Let A be a (q — p) matrix and consider the transformed data matrix
Y = X A = (y1 , . . . , yn ) . (3.22)

The row yi = (yi1 , . . . , yiq ) ∈ Rq can be viewed as the i-th observation of a q-dimensional
random variable Y = AX. In fact we have yi = xi A . We immediately obtain the mean
and the empirical covariance of the variables (columns) forming the data matrix Y:
1 1
Y 1n = AX 1n = Ax
y= (3.23)
n n
1 1
SY = Y HY = AX HX A = ASX A . (3.24)
n n
Note that if the linear transformation is nonhomogeneous, i.e.,
yi = Axi + b where b(q — 1),
only (3.23) changes: y = Ax + b. The formula (3.23) and (3.24) are useful in the particular
case of q = 1, i.e., y = X a ” yi = a xi ; i = 1, . . . , n:
y = ax
Sy = a SX a.

EXAMPLE 3.9 Suppose that X is the pullover data set. The manager wants to compute
his mean expenses for advertisement (X3 ) and sales assistant (X4 ).
Suppose that the sales assistant charges an hourly wage of 10 EUR. Then the shop manager
calculates the expenses Y as Y = X3 + 10X4 . Formula (3.22) says that this is equivalent to
de¬ning the matrix A(4 — 1) as:
A = (0, 0, 1, 10).
Using formulas (3.23) and (3.24), it is now computationally very easy to obtain the sample
mean y and the sample variance Sy of the overall expenses:
« 
172.7
¬ 104.6 ·
y = Ax = (0, 0, 1, 10) ¬
 104.0  = 1042.0
·

93.8
3.4 Linear Model for Two Variables 95

« « 
1152.5 ’88.9 1589.7 301.6 0
¬ ’88.9 244.3 102.3 ’101.8 · ¬ 0 ·
SY = ASX A = (0, 0, 1, 10) 
¬ ·¬ ·
1589.7 102.3 2915.6 233.7   1 
301.6 ’101.8 233.7 197.1 10
= 2915.6 + 4674 + 19710 = 27299.6.


Mahalanobis Transformation

A special case of this linear transformation is

zi = S ’1/2 (xi ’ x), i = 1, . . . , n. (3.25)

Note that for the transformed data matrix Z = (z1 , . . . , zn ) ,

SZ = n’1 Z HZ = Ip . (3.26)

So the Mahalanobis transformation eliminates the correlation between the variables and
standardizes the variance of each variable. If we apply (3.24) using A = S ’1/2 , we obtain
the identity covariance matrix as indicated in (3.26).




Summary
’ The center of gravity of a data matrix is given by its mean vector x =
n’1 X 1n .
’ The dispersion of the observations in a data matrix is given by the empir-
ical covariance matrix S = n’1 X HX .
’ The empirical correlation matrix is given by R = D’1/2 SD’1/2 .
’ A linear transformation Y = X A of a data matrix X has mean Ax and
empirical covariance ASX A .
’ The Mahalanobis transformation is a linear transformation zi = S ’1/2 (xi ’
x) which gives a standardized, uncorrelated data matrix Z.




3.4 Linear Model for Two Variables

We have looked many times now at downward- and upward-sloping scatterplots. What does
the eye de¬ne here as slope? Suppose that we can construct a line corresponding to the
96 3 Moving to Higher Dimensions


general direction of the cloud. The sign of the slope of this line would correspond to the
upward and downward directions. Call the variable on the vertical axis Y and the one on
the horizontal axis X. A slope line is a linear relationship between X and Y :

yi = ± + βxi + µi , i = 1, . . . , n. (3.27)

Here, ± is the intercept and β is the slope of the line. The errors (or deviations from the
line) are denoted as µi and are assumed to have zero mean and ¬nite variance σ 2 . The task
of ¬nding (±, β) in (3.27) is referred to as a linear adjustment.
In Section 3.6 we shall derive estimators for ± and β more formally, as well as accurately
describe what a “good” estimator is. For now, one may try to ¬nd a “good” estimator (±, β)
via graphical techniques. A very common numerical and statistical technique is to use those
± and β that minimize:
n
(yi ’ ± ’ βxi )2 .
(±, β) = arg min (3.28)
(±,β)
i=1

The solutions to this task are the estimators:
sXY
β= (3.29)
sXX
± = y ’ βx. (3.30)

The variance of β is:
σ2
V ar(β) = . (3.31)
n · sXX
The standard error (SE) of the estimator is the square root of (3.31),
σ
SE(β) = {V ar(β)}1/2 = . (3.32)
(n · sXX )1/2
We can use this formula to test the hypothesis that β=0. In an application the variance
σ 2 has to be estimated by an estimator σ 2 that will be given below. Under a normality
assumption of the errors, the t-test for the hypothesis β = 0 works as follows.
One computes the statistic

β
t= (3.33)
SE(β)
and rejects the hypothesis at a 5% signi¬cance level if | t |≥ t0.975;n’2 , where the 97.5%
quantile of the Student™s tn’2 distribution is clearly the 95% critical value for the two-sided
test. For n ≥ 30, this can be replaced by 1.96, the 97.5% quantile of the normal distribution.
An estimator σ 2 of σ 2 will be given in the following.
3.4 Linear Model for Two Variables 97



pullovers data
200
sales (X2)
150
100




80 90 100 110 120
price (X2)




Figure 3.5. Regression of sales (X1 ) on price (X2 ) of pullovers.
MVAregpull.xpl


EXAMPLE 3.10 Let us apply the linear regression model (3.27) to the “classic blue” pullovers.
The sales manager believes that there is a strong dependence on the number of sales as a
function of price. He computes the regression line as shown in Figure 3.5.


How good is this ¬t? This can be judged via goodness-of-¬t measures. De¬ne

yi = ± + βxi , (3.34)

as the predicted value of y as a function of x. With y the textile shop manager in the above
example can predict sales as a function of prices x. The variation in the response variable
is:
n
(yi ’ y)2 .
nsY Y = (3.35)
i=1
98 3 Moving to Higher Dimensions


The variation explained by the linear regression (3.27) with the predicted values (3.34) is:
n
(yi ’ y)2 . (3.36)
i=1

The residual sum of squares, the minimum in (3.28), is given by:
n
(yi ’ yi )2 .
RSS = (3.37)
i=1

An unbiased estimator σ 2 of σ 2 is given by RSS/(n ’ 2).
The following relation holds between (3.35)“(3.37):
n n n
2 2
(yi ’ yi )2 ,
(yi ’ y) (yi ’ y) +
= (3.38)
i=1 i=1 i=1
total variation = explained variation + unexplained variation.

The coe¬cient of determination is r2 :
n
(yi ’ y)2
explained variation
i=1
r2 = ·
= (3.39)
n
total variation
y)2
(yi ’
i=1

The coe¬cient of determination increases with the proportion of explained variation by the
linear relation (3.27). In the extreme cases where r2 = 1, all of the variation is explained by
the linear regression (3.27). The other extreme, r2 = 0, is where the empirical covariance is
sXY = 0. The coe¬cient of determination can be rewritten as
n
(yi ’ yi )2
i=1
r2 = 1 ’ . (3.40)
n
(yi ’ y)2
i=1

From (3.39), it can be seen that in the linear regression (3.27), r2 = rXY is the square of
2

the correlation between X and Y .

EXAMPLE 3.11 For the above pullover example, we estimate

β = ’0.364.
± = 210.774 and

The coe¬cient of determination is
r2 = 0.028.
The textile shop manager concludes that sales are not in¬‚uenced very much by the price (in
a linear way).

<<

. 4
( 4)