<< стр. 14(всего 16)СОДЕРЖАНИЕ >>
Figure A1.3 Geometric interpretation of partial derivatives

As for the functions of a single variable, the extrema of a derivable function can be
determined thanks to two conditions.

вЂў The п¬Ѓrst-order (necessary) condition states that if x (0) is an extremum of f , then all
the partial derivatives of f will be zero in x (0) :

fxi (x (0) ) = 0 (i = 1, . . . , n)

When referring to the geometric interpretation of the partial derivatives of a function
of two variables, at this type of point (x0 , y0 ), called the stationary point, the tangents
to the curves Cx and Cy are therefore horizontal.
вЂў The second-order (sufп¬Ѓcient) condition allows the stationary points to be вЂ˜sortedвЂ™
according to their nature, but п¬Ѓrst and foremost requires deп¬Ѓnition of the Hessian
matrix of the function f at point x, made up of second partial derivatives of f :
пЈ« пЈ¶
fx1 x1 (x) fx1 x2 (x) В·В·В· fx1 xn (x)
пЈ¬ пЈ·
пЈ¬ f (x) fx2 xn (x) пЈ·
fx2 x2 (x) В·В·В·
H (f (x1 , . . . , xn )) = пЈ¬ x2 x1 пЈ·
пЈ¬ пЈ·
. . .
. . .
пЈ­ пЈё
. . .
fxn x1 (x) fxn x2 (x) В·В·В· fxn xn (x)

If x (0) is a stationary point of f and H (f (x)) is p.d. at x (0) or s.p. in a neighbourhood
of x (0) , we have a minimum. In the opposite situation, if H (f (x)) is n.d. at x (0) or s.n.
in a neighbourhood of x (0) , we have a maximum.3

1.2.1.3 Extrema under constraint(s)
This is a similar concept, but one in which the analysis of the problem of extrema is
restricted to those x values that obey one or more constraints.

3
These notions are explained in Section 1.3.2.1 in this Appendix.
Mathematical Concepts 331
(0) (0)
The point (x1 , . . . , xn ) is a local maximum (resp. minimum) of the function f under
пЈ±
the constraints
пЈІ g1 (x) = 0
...
пЈі
gr (x) = 0

If x (0) veriп¬Ѓes the constraints itself and
(0) (0)
(0) (0)
f (x1 , . . . , xn ) в‰Ґ f (x1 , . . . , xn ) [resp. f (x1 , . . . , xn ) в‰¤ f (x1 , . . . , xn )]

for any (x1 , . . . , xn )

(0) (0)
in a neighbourhood of (x1 , В· В· В· , xn )
satisfying the r constraints

Solving this problem involves considering the Lagrangian function of the problem. We
are looking at a function of the (n + r) variables (x1 , . . . , xn ; m1 , . . . , mr ), the latest r
values вЂ“ known as Lagrangian multipliers вЂ“ each correspond to a constraint:

L(x1 , . . . , xn ; m1 , . . . , mr ) = f (x) + m1 В· g1 (x) + В· В· В· + mr В· gr (x)

We will not go into the technical details of solving this problem. We will, however, point
out an essential result: if the point (x (0) ; m(0) ) is such that x (0) veriп¬Ѓes the constraints and
(x (0) ; m(0) ) is a extremum (without constraint) of the Lagrangian function, then x (0) is an
extremum for the problem of extrema under constraints.

1.2.2 TaylorвЂ™s formula
TaylorвЂ™s formula is also generalised for the n-variable functions, but the degree 1 term,
which reveals the п¬Ѓrst derivative, is replaced by n terms with the n partial derivatives:

(0) (0) (0)
fxi (x1 , x2 , . . . , xn ) i = 1, 2, . . . , n

In the same way, the degree 2 term, the coefп¬Ѓcient of which constitutes the second
derivative, here becomes a set of n2 terms in which the various second partial derivatives
are involved:
(0) (0) (0)
fxi xj (x1 , x2 , . . . , xn ) i, j = 1, 2, . . . , n

Thus, by limiting the writing to the degree 2 terms, TaylorвЂ™s formula is written as follows:
n
1
(0) (0) (0) (0)
fxi (x (0) )hi
f (x1 + h1 , x2 + h2 , . . . , x n + hn ) в‰€ f (x ) +
1! i=1
n n
1
fxi xj (x (0) )hi hj + В· В· В·
+
2! i=1 j =1
332 Asset and Risk Management

1.3 MATRIX CALCULUS
1.3.1 Deп¬Ѓnitions
1.3.1.1 Matrices and vectors
The term n-order matrix is given to a set of n2 real numbers making up a square table
consisting of n rows and n columns.4 A matrix is generally represented by a capital letter
(such as A), and its elements by the corresponding lower-case letter (a) with two allocated
indices representing the row and column to which the element belongs: aij is the element
of matrix A located at the intersection of row i and column j within A. Matrix A can
therefore be written generally as follows:
пЈ« пЈ¶
a11 a12 В·В·В· a1j В·В·В· a1n
пЈ¬ a21 a2n пЈ·
a22 В·В·В· a2j В·В·В·
пЈ¬ пЈ·
пЈ¬. .пЈ·
. .
пЈ¬. . . .пЈ·
. . . .пЈ·
A=пЈ¬
пЈ¬ ai1 В· В· В· ain пЈ·
ai2 В·В·В· aij
пЈ¬ пЈ·
пЈ¬. .пЈ·
. .
пЈ­. . . .пЈё
. . . .
an1 an2 В·В·В· anj В· В· В· ann

In the same way, a vector of n dimension is a set of n real numbers forming a columnar
table. The elements in a vector are its components and are referred to by a single index.
пЈ«
пЈ¶
x1
пЈ¬ x2 пЈ·
пЈ¬пЈ·
пЈ¬.пЈ·
пЈ¬.пЈ·.
X=пЈ¬ пЈ·
пЈ¬ xi пЈ·
пЈ¬пЈ·
пЈ¬.пЈ·
пЈ­.пЈё.
xn

1.3.1.2 Speciп¬Ѓc matrices
The diagonal elements in a matrix are the elements a11 , a22 , . . . , ann . They are located
on the diagonal of the table that starts from the upper left-hand corner; this is known as
the principal diagonal.
A matrix is deп¬Ѓned as symmetrical if the elements symmetrical with respect to the
principal diagonal are equal: aij = aji . Here is an example:

пЈ« пЈ¶
2 в€’3 0
в€љ
A = пЈ­ в€’3 2пЈё
1
в€љ
0 2 0

More generally, a matrix is a rectangular table with the format (m, n); m rows and n columns. We will, however, only
4

be looking at square matrices here.
Mathematical Concepts 333

An upper triangular matrix is a matrix in which the elements located underneath the
principal diagonal are zero: aij = 0 when i < j . For example:
пЈ« пЈ¶
0 2 в€’1
A = пЈ­0 3 0пЈё
00 5

The concept of a lower triangular matrix is of course deп¬Ѓned in a similar way.
Finally, a diagonal matrix is one that is both upper triangular and lower triangular. Its
only non-zero elements are the diagonal elements: aji = 0 when i and j are different.
Generally, this type of matrix will be represented by:
пЈ« пЈ¶
a1 0 В· В· В· 0
пЈ¬ 0 a2 В· В· В· 0 пЈ·
пЈ¬ пЈ·
A=пЈ¬ . . пЈ· = diag (a1 , a2 , . . . , an )
. ..
пЈ­. . . .пЈё
. . .
0 0 В· В· В· an

1.3.1.3 Operations
The sum of two matrices, as well as the multiplication of a matrix by a scalar, are
completely natural operations: the operation in question is carried out for each of the
elements. Thus:

(A + B)ij = aij + bij
(О»A)ij = О»aij

These deп¬Ѓnitions are also valid for the vectors:

(X + Y )i = xi + yi
(О»X)i = О»xi

The product of two matrices A and B is a matrix of the same order as A and B, in which
the element (i, j ) is obtained by calculating the sum of the products of the elements in
line i of A with the corresponding elements in column j in B:
n
(AB)ij = ai1 b1j + ai2 b2j + В· В· В· + ain bnj = aik bkj
k=1

We will have, for example:
пЈ¶пЈ« пЈ¶
пЈ« пЈ¶пЈ«
0 в€’1 5 в€’2 в€’2 10 в€’3
2 0
1 пЈё В· пЈ­ 3 в€’1 0 пЈё = пЈ­ в€’4 17 в€’7 пЈё
пЈ­ 3 в€’2
6 в€’17
в€’3 0 в€’1
2 6
2 0

Despite the apparently complex deп¬Ѓnition, the matrix product has a number of classical
properties; it is associative and distributive with respect to addition. However, it needs to
be handled with care as it lacks one of the classical properties: it is not commutative. AB
does not equal BA!
334 Asset and Risk Management

The product of a matrix by a vector is deп¬Ѓned using the same вЂњlines by columnsвЂќ
procedure:
n
(AX)i = aik xk
k=1

The transposition of a matrix A is the matrix At , obtained by permuting the symmetrical
elements with respect to the principal diagonal, or, which amounts to the same thing, by
permuting the role of the lines and columns in matrix A:
(At )ij = aji
A matrix is thus symmetrical if, and only if, it is equal to its transposition. In addition
this operation, applied to a vector, gives the corresponding line vector as its result.
The inverse of matrix A is matrix Aв€’1 , if it exists, so that: AAв€’1 = Aв€’1 A
= diag(1, . . . , 1) = I .
For example, it is easy to verify that:
пЈ« пЈ¶в€’1 пЈ« пЈ¶
1 в€’1
10 1 3
пЈ­ в€’2 1 в€’3 пЈё = пЈ­ 0 1пЈё
0
в€’2 в€’1
01 0 1
Finally, let us deп¬Ѓne the trace of a matrix. The trace is the sum of the matrixвЂ™s diag-
onal elements: n
tr(A) = a11 + a22 + В· В· В· + ann = aii
i=1

1.3.2.1 Quadratic form and class of symmetrical matrix
A quadratic form is a polynomial function with n variables containing only second-
degree terms:
n n
Q(x1 , x2 , . . . , xn ) = aij xi xj
i=1 j =1

If we construct a matrix A from coefп¬Ѓcients aij (i, j = 1, . . . , n) and the vector X of the
variables xi (i = 1, . . . , n), we can give a matrix expression to the quadratic form:
Q(X) = Xt AX.
In fact, by developing the straight-line member, we produce:
n
t
X AX = xi (AX)i
i=1
n n
= xi aij xj
i=1 j =1
n n
= aij xi xj
i=1 j =1
Mathematical Concepts 335

A quadratic form can always be associated with a matrix A, and vice versa. The matrix,
however, is not unique. In fact, the quadratic form Q(x1 , x2 ) = 3x1 в€’ 4x1 x2 can be associ-
2

3 в€’2 3 в€’6
30
ated with matrices A = B= C= , as well
в€’2 в€’4 0
0 2 0
as inп¬Ѓnite number of others. Amongst all these matrices, only one is symmetrical (A in
the example given). There is therefore bijection between all the quadratic forms and all
the symmetrical matrices.
The class of a symmetrical matrix is deп¬Ѓned on the basis of the sign of the associated
quadratic form. Thus, the non-zero matrix A is said to be positive deп¬Ѓnite (p.d.) if Xt AX >
0 for any X not equal to 0, and semi-positive (s.p.) when:

Xt AX в‰Ґ 0 for any X = 0
there is one Y = 0 so that Y t AY = 0

A matrix is negative deп¬Ѓnite (n.d.) and semi-negative (s.n.) by the inverse inequalities,
and the term non-deп¬Ѓnite is given to a symmetrical matrix for which there are some X
and Y = 0 so that Xt AX > 0 and Y t AY < 0.
пЈ« пЈ¶
в€’3 в€’4
5
The symmetrical matrix A = пЈ­ в€’3 2 пЈё is thus p.d., as the associated quadratic
10
в€’4 2 8
form can be written as:

Q(x, y, z) = 5x 2 + 10y 2 + 8z2 в€’ 6xy в€’ 8xz + 4yz
= (x в€’ 3y)2 + (2x в€’ 2z)2 + (y + 2z)2

This form will never be negative, and simply cancels out when:

x в€’ 3y = 0
2x в€’ 2z = 0
y + 2z = 0

That is, when x = y = z = 0.

1.3.2.2 Linear equation system
A system of n linear equations with n unknowns is a set of relations of the following type:
пЈ±
пЈґ a11 x1 + a12 x2 + В· В· В· + a1n xn = b1
пЈґ
пЈІ
a21 x1 + a22 x2 + В· В· В· + a2n xn = b2
В·В·В·
пЈґ
пЈґ
пЈіa x + a x + В·В·В· + a x = b
n1 1 n2 2 nn n n

In it, the aij , xj and bi are respectively the coefп¬Ѓcients, the unknowns and the second
members. They are written naturally in both matrix and vectorial form: A, X and B.
Using this notation, the system is written in an equivalent but more condensed way:

AX = B
336 Asset and Risk Management

For example, the system of equations

2x + 3y = 4
4x в€’ y = в€’2

can also be written as:
x
2 3 4
=
4 в€’1 y в€’2

If the inverse of matrix A exists, it can easily be seen that the system admits one and just
one solution, given as X = Aв€’1 X.

1.3.2.3 Case of varianceвЂ“covariance matrix5
пЈ«2 пЈ¶
Пѓ1 Пѓ12 В· В· В· Пѓ1n
пЈ¬ Пѓ21 Пѓ2 В· В· В· Пѓ2n пЈ·
2
пЈ¬ пЈ·
The matrix V = пЈ¬ . . пЈ·, for the variances and covariances of a number
. ..
пЈ­. . .пЈё
.
. . .
Пѓn1 Пѓn2 В· В· В· Пѓn 2

of random variables X1 , X2 , . . . , Xn is a matrix that is either p.d. or s.p.
In effect, regardless of what the numbers О»1 , О»2 , . . . , О»n are, not all zero and making
up the vector , we have:

n n n
t
V = О»i О»j Пѓij = var О»i Xi в‰Ґ0
i=1 j =1 i=1

It can even be said, according to this result, that the varianceвЂ“covariance matrix V is
p.d. except when there are coefп¬Ѓcients О»1 , О»2 , . . . , О»n that are not all zero, so that the
random variable О»1 X1 + В· В· В· + О»n Xn = n О»i Xi is degenerate, in which case V will be
i=1
s.p. This degeneration may occur, for example, when:

вЂў one of the variables is degenerate;
вЂў some variables are perfectly correlated;
вЂў the matrix V is obtained on the basis of observations of a number strictly lower than
the number of variables.

It will then be evident that the varianceвЂ“covariance matrix can be expressed as a matrix,
through the relation:
V = E[(X в€’ Вµ)(X в€’ Вµ)t ]

1.3.2.4 Choleski factorisation
Consider a symmetrical matrix A positive deп¬Ѓnite. It can be demonstrated that there exists
a lower triangular matrix L with strictly positive diagonal elements so that A = LLt .

5
The concepts necessary for an understanding of this example are shown in Appendix 2.
Mathematical Concepts 337

This factorisation process is known as a Choleski factorisation. We will not be demon-
strating this property, but will show, using the previous example, how the matrix L
is found:
пЈ« пЈ¶пЈ« пЈ¶ пЈ«2 пЈ¶
a00 abd
LLt = пЈ­ b c 0 пЈё пЈ­ 0 c f пЈё = пЈ­ ab b2 + c2 bd + cf пЈё
dfg 00g ad bd + cf d + f 2 + g 2
2
пЈ« пЈ¶
в€’3 в€’4
5
= A = пЈ­ в€’3 2пЈё
10
в€’4 2 8

It is then sufп¬Ѓcient to work the last equality in order to п¬Ѓnd a, b, c, d, f and g in
succession, which will give the following for matrix L.
пЈ«в€љ пЈ¶
5 0 0
в€љ в€љ
пЈ¬ пЈ·
пЈ¬35 пЈ·
205
пЈ¬ 0пЈ·
L = пЈ¬в€’ 5 пЈ·
5
пЈ¬ в€љпЈ·
пЈ­ 4в€љ5 в€љ
14 41 пЈё
2 205
в€’ в€’
5 205 41
Appendix 2
Probabilistic Concepts1

2.1 RANDOM VARIABLES
2.1.1 Random variables and probability law
2.1.1.1 Deп¬Ѓnitions
Let us consider a fortuitous phenomenon, that is, a phenomenon that under given initial
conditions corresponds to several possible outcomes. A numerical magnitude that depends
on the observed result is known as a random variable or r.v.
In addition, probabilities are associated with various possible results or events deп¬Ѓned
in the context of the fortuitous phenomenon. It is therefore interesting to п¬Ѓnd out the
probabilities of the various events deп¬Ѓned on the basis of the r.v. What we are looking at
here is the concept of law of probability of the r.v. Thus, if the r.v. is termed X, the law
of probability of X is deп¬Ѓned by the range of the following probabilities: Pr[X в€€ A], for
every subset A of R.
The aim of the concept of probability law is a bold one: the subsets A of R are
too numerous for all the probabilities to be known. For this reason, we are content to
work with just the ]в€’в€ћ; t] sets. This therefore deп¬Ѓnes a function of the variable t,
the cumulative distribution function or simplier distribution function (d.f.) of the random
variable F (t) = Pr[X в‰¤ t].
It can be demonstrated that this function, deп¬Ѓned in R, is increasing, that it is between 0
1
and 1, that it admits the ordinates 0 and 1 as horizontal asymptotics limtв†’В±в€ћ F (t) = ,
0
and that it is right-continuous: limsв†’t+ F (s) = F (t).
These properties are summarised in Figure A2.1.
In addition, despite its simplicity, the d.f. allows almost the whole of the probability
law for X to be found, thus:

Pr[s < X в‰¤ t] = F (t) в€’ F (s)
Pr[X = t] = F (t) в€’ F (tв€’)

2.1.1.2 Quantile
Sometimes there is a need to solve the opposite problem: being aware of a probability
level u and determining the value of t so that F (t) = Pr[X в‰¤ t] = u.
This value is known as the quantile of the r.v. X at point u and its deп¬Ѓnition are shown
in Figure A2.2.

1
Readers wishing to п¬Ѓnd out more about these concepts should read: Baxter M. and Rennie A., Financial Calculus,
Cambridge University Press, 1996. Feller W., An Introduction to Probability Theory and its Applications (2 volumes), John
Wiley and Sons, Inc., 1968. Grimmett G. and Stirzaker D., Probability and Random Processes, Oxford University Press,
1992. Roger P., Les outils de la modВґ lisation п¬Ѓnanci` re, Presses Universitaires de France, 1991. Ross S. M., Initiation aux
e e
probabilitВґ s, Press Polytechniques et Universitaires Romandes, 1994.
e
340 Asset and Risk Management
F(t)
1

t
0

Figure A2.1 Distribution function

F(t)
1

u

t
0 Q(u)

Figure A2.2 Quantile

F(t)
1

u

0 Q(u) t

Figure A2.3 Quantile in jump scenario

In two cases, however, the deп¬Ѓnition that we have just given is unsuitable and needs to
be adapted. First of all, if the d.f. of X shows a jump that covers the ordinate u, none of
the abscissas will correspond to it and the abscissa of the jump, naturally, will be chosen
(see Figure A2.3).
Next, if the ordinate u corresponds to a plateau on the d.f. graph, there is an inп¬Ѓnite
number of abscissas on the abscissa to choose from (see Figure A2.4).
In this case, an abscissa deп¬Ѓned by the relation Q(u) = um + (1 в€’ u)M can be chosen.
The quantile function thus deп¬Ѓned generalises the concept of the reciprocal function of
the d.f.

2.1.1.3 Discrete random variable
A discrete random variable corresponds to a situation in which the set of possible values
for the variable is п¬Ѓnite or inп¬Ѓnite countable. In this case, if the various possible values
Probabilistic Concepts 341

F(t)
1

u

m Q(u) M
0

Figure A2.4 Quantile in plateau scenario

and corresponding probabilities are known

x1 x2 В·В·В· xn В·В·В·
p1 p2 В·В·В· pn В·В·В·
Pr[X = xi ] = pi i = 1, 2, . . . , n, . . .
pi = 1
i

The law of probability of X can be easily determined:

Pr[X в€€ A] = pi
{i:xi в€€A}

The d.f. of a discrete r.v. is a stepped function, as the abscissas of jumps correspond to
the various possible values of X and the heights of the jumps are equal to the associated
probabilities (see Figure A2.5).
In particular, a r.v. is deп¬Ѓned as degenerate if it can only take on one value x (also
referred to as a certain variable): Pr[X = x] = 1.
The d.f. for a degenerate variable will be 0 to the left of x and 1 from x onwards.

2.1.1.4 Continuous random variable
In contrast to the discrete r.v., the set of possible values for a r.v. could be continuous
(an interval, for example) with no individual value having a strictly positive probability:

Pr[X = x] = 0 в€Ђx

F(t)
1

p4

x4
0

Figure A2.5 Distribution function for a discrete random variable
342 Asset and Risk Management
f

x x+h

Figure A2.6 Probability density

In this case, the distribution of probabilities over the set of possible values is expressed
using a density function f : for a sufп¬Ѓciently small h, we will have Pr[x < X в‰¤ x + h] в‰€
hf (x).
This deп¬Ѓnition is shown in Figure A2.6.
The law of probability is obtained from the density through the following relation:

Pr[X в€€ A] = f (x) dx
A

And as a particular case:
t
F (t) = f (x) dx
в€’в€ћ

2.1.1.5 Multivariate random variables
Often there is a need to consider several r.v.s simultaneously X1 , X2 , . . . , Xm , associated
with the same fortuitous phenomenon.2 Here, we will simply show the theory for a bivari-
ate random variable, that is, a pair of r.v.s (X, Y ); the general process for a multivariate
random variable can easily be deduced from this.
The law of probability for a bivariate random variable is deп¬Ѓned as the set of the
following probabilities: Pr [(X, Y ) в€€ A], for every subset A of R2 . The joint distribu-
tion function is deп¬Ѓned F (s, t) = Pr([X в‰¤ s] в€© [Y в‰¤ t]) and the discrete and continuous
bivariate random variables are deп¬Ѓned respectively by:
pij = Pr([X = xi ] в€© [Y = yj ])

Pr[(X, Y ) в€€ A] = f (x, y) dx dy
A

Two r.v.s are deп¬Ѓned as independent when they are not inп¬‚uenced either from the point
of view of possible values or through the probability of the events that they deп¬Ѓne. More
formally, X and Y are independent when:

Pr([X в€€ A] в€© [Y в€€ B]) = Pr[X в€€ A] В· Pr[Y в€€ B]

for every A and B in R.

2
For example, the return on various п¬Ѓnancial assets.
Probabilistic Concepts 343

It can be shown that two r.v.s are independent if, and only if, their joint d.f. is equal to the
product of the d.f.s of each of the r.v.s: F (s, t) = FX (s) В· FY (t). And that this condition,
for discrete or continuous random variables, shows as:
pij = Pr[X = xi ] В· Pr[Y = yj ]
f (x, y) = fX (x) В· fY (y)

2.1.2 Typical values of random variables
The aim of the typical values of a r.v. is to summarise the information contained in
its probability law in a number of representative parameters: parameters of location,
dispersion, skewness and kurtosis. We will be looking at one from each group.

2.1.2.1 Mean
The mean is a central value that locates a r.v. by dividing the d.f. into two parts with the
same area (see Figure A2.7).
The mean Вµ of the r.v. X is therefore such that:
Вµ +в€ћ
F (t) dt = [1 в€’ F (t)] dt
в€’в€ћ Вµ

The mean of a r.v. can be calculated on the basis of the d.f.:
+в€ћ 0
Вµ= [1 в€’ F (t)] dt в€’ F (t) dt
в€’в€ћ
0

the formula reducing for a positive r.v. as follows:
+в€ћ
Вµ= [1 в€’ F (t)] dt
0

It is possible to demonstrate that for a discrete r.v. and a continuous r.v., we have
the formulae:

Вµ= xi pi
i
+в€ћ
Вµ= xf (x) dx
в€’в€ћ

Figure A2.7 Mean of a random variable
344 Asset and Risk Management

The structure of these two formulae shows that Вµ integrates the various possible values
for the r.v. X by weighting them through the probabilities associated with these values.
It can be shown3 that these formulae generalised into an abstract integral of X(П‰) with
of the possible outcomes П‰ of the
respect to the measure of probability Pr in the set
fortuitous phenomenon. This integral is known as the expectation of the r.v. X:

E(X) = X(П‰)d Pr(П‰)

According to the foregoing, there is equivalence between the concepts of expectation
and mean (E(X) = Вµ) and we will interchange both these terms from now on.
The properties of the integral show that the expectation is a linear operator:

E(aX + bY + c) = aE(X) + bE(Y ) + c

And that if X and Y are independent, them E(XY ) = E(X) В· E(Y ).
In addition, for a discrete r.v. or a continuous r.v., the expectation of a function of a
r.v. variable is given by:

E(g(X)) = g(xi )pi
i
+в€ћ
E(g(X)) = g(x)f (x) dx
в€’в€ћ

Let us remember п¬Ѓnally the law of large numbers,4 which for a sequence of independent
r.v.s X1 , X2 , . . . , Xn with identical distribution and a mean Вµ, expresses that regardless
of what Оµ > 0 may be

X1 + X2 + В· В· В· + Xn
в€’Вµ в‰¤Оµ =1
lim Pr
n
nв†’в€ћ

This law justiп¬Ѓes taking the average of a sample to estimate the mean of the pop-
ulation and in particular estimating the probability of an event through the frequency
of that eventвЂ™s occurrence when a large number of realisations of the fortuitous phe-
nomenon occur.

2.1.2.2 Variance and standard deviation
One of the most commonly used dispersion indices (that is, a measurement of the spread
of the r.v.s values around its mean) is the variance Пѓ 2 , deп¬Ѓned as:

Пѓ 2 = var(X) = E[(X в€’ Вµ)2 ]

3
This development is part of measure theory, which is outside the scope of this work. Readers are referred to Loeve M.,
Probability Theory (2 volumes), Springer-Verlag, 1977.
4
We are showing this law in its weak form here.
Probabilistic Concepts 345

f

x

Figure A2.8 Variance of a random variable

By developing the right member, we can therefore arrive at the variance

Пѓ 2 = E(X2 ) в€’ Вµ2

For a discrete r.v. and a continuous r.v., this will give:

Пѓ2 = (xi в€’ Вµ)2 pi = xi 2 pi в€’ Вµ2
i i
+в€ћ +в€ћ
Пѓ= (x в€’ Вµ) f (x) dx = x 2 f (x) dx в€’ Вµ2
2 2
в€’в€ћ в€’в€ћ

An example of the interpretation of this parameter is found in Figure A2.8.
It can be demonstrated that var(aX + b) = a 2 var(X). And that if X and Y are inde-
pendent, then var(X + Y ) = var(X) + var(Y ).
Alongside the variance, the dimension of which is the square of the dimension of X,
we can also use the standard deviation, which is simply the square root:

Пѓ= var(X)

2.1.2.3 FisherвЂ™s skewness and kurtosis coefп¬Ѓcients
FisherвЂ™s skewness coefп¬Ѓcient is deп¬Ѓned by:

E[(X в€’ Вµ)3 ]
Оі1 =
Пѓ3
It is interpreted essentially on the basis of its sign: if Оі1 > 0 (resp. <0), the distribution
of X will be concentrated to the left (resp. the right) and spread out to the right (resp. the
left). For a symmetrical distribution, Оі1 = 0. This interpretation is shown in Figure A2.9.
FisherвЂ™s kurtosis coefп¬Ѓcient is given by:

E[(X в€’ Вµ)4 ]
Оі2 = в€’3
Пѓ4
It is interpreted by comparison with the normal distribution (see Section A.2.2.1). This
distribution has a kurtosis coefп¬Ѓcient of 0. Distributions with higher kurtosis than the
normal law (also termed leptokurtic) are more pointed in the neighbourhood of their
346 Asset and Risk Management

f(x) Оі1 = 3.5
Оі1 = 0
Оі1 = вЂ“3.5

0 x

Figure A2.9 Skewness coefп¬Ѓcient of a random variable

f(x) Оі2 = 3
Оі2 = вЂ“0.6

x
0

Figure A2.10 Kurtosis coefп¬Ѓcient of a random variable

mean and present fatter tails (and are therefore less important for intermediate values)
than normal distribution; they are characterised by a positive Оі2 parameter. Of course, the
distributions with lower kurtosis have a negative kurtosis coefп¬Ѓcient (see Figure A2.10).
For discrete or continuous r.v.s, the formulae that allows E(g(X)) to be calculated are
used as usual.

2.1.2.4 Covariance and correlation
We now come to the parameters relative to the bivariate random variables. Covariance
between two r.v.s, X and Y , is deп¬Ѓned by: ПѓXY = cov(X, Y ) = E[(X в€’ ВµX )(Y в€’ ВµY )]
and can also be calculated by cov (X, Y ) = E(XY ) в€’ ВµX ВµY .
For discrete r.v.s and continuous r.v.s, the covariance is calculated by:

cov(X, Y ) = (xi в€’ ВµX )(yj в€’ ВµY )pij
i j

= xi yj pij в€’ ВµX ВµY
i j
+в€ћ +в€ћ
cov(X, Y ) = (x в€’ ВµX )(y в€’ ВµY )f (x, y) dx dy
в€’в€ћ в€’в€ћ
+в€ћ +в€ћ
= xyf (x, y) dx dy в€’ ВµX ВµY
в€’в€ћ в€’в€ћ
Probabilistic Concepts 347

The covariance is interpreted as follows: we are looking at the degree of linear connection
that exists between the two r.v.s. A positive covariance thus corresponds to values of the
product (X в€’ ВµX ) (Y в€’ ВµY ) that will be mostly positive and the two factors will be mostly
of the same sign. High values for X (greater than ВµX ) will correspond to high values for
Y (greater than ВµY ) and low values for X will correspond to low values for Y . The same
type of reasoning also applies to negative covariance.
It can be demonstrated that:

cov(aX + bY + c, Z) = a cov(X, Z) + b cov(Y, Z)
cov(X, X) = var(X)
E(XY ) = E(X) В· E(Y ) + cov(X, Y )
var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y )

and that if X and Y are independent, their covariance is zero. In this case, in fact:

cov(X, Y ) = E[(X в€’ ВµX )(Y в€’ ВµY )]
= E(X в€’ ВµX )E(Y в€’ ВµY )
= (E(X) в€’ ВµX )(E(Y ) в€’ ВµY )
=0

Another parameter, which measures the degree of linear connection between the two
r.v.s is the correlation coefп¬Ѓcient:
ПѓXY
ПЃXY = corr(X, Y ) =
ПѓX В· ПѓY

The interest in the correlation coefп¬Ѓcient in comparison to covariance is that we are
looking at a number without dimension, while the covariance measurement unit is equal
to the product of the units of the two r.v.s. Also, the correlation coefп¬Ѓcient can only assume
values between в€’1 and 1 and these two extreme values correspond to the existence of a
perfect linear relation (increasing or decreasing depending on whether ПЃ = 1 or ПЃ = в€’1)
between the two r.v.s.
Two r.v.s whose correlation coefп¬Ѓcient (or covariance) is zero are termed non-correlated.
It has been said earlier that independent r.v.s are noncorrelated, but the inverse is not true!
The independence of two r.v.s in fact excludes the existence of any relation between the
variables, while noncorrelation simply excludes the existence of a linear relation.

2.2 THEORETICAL DISTRIBUTIONS
2.2.1 Normal distribution and associated ones
2.2.1.1 Normal distribution
Remember that a normal random variable with parameters (Вµ; Пѓ ) is deп¬Ѓned by its density:
1 xв€’Вµ 2
1
f (x) = в€љ exp в€’ , which is shown graphically in Figure A2.11.
Пѓ
2
2ПЂПѓ
348 Asset and Risk Management
f(x)

0 Вµ x

Figure A2.11 Normal density

The normal density graph is symmetrical with respect to the vertical straight line of
abscissa Вµ and shows two points of inп¬‚exion, at (Вµ в€’ Пѓ ) and (Вµ + Пѓ ).
The typical values for this distribution are given by:

E(X) = Вµ
var (X) = Пѓ 2
Оі1 (X) = 0
Оі2 (X) = 0

If the r.v. X is distributed following a normal law with parameters (Вµ; Пѓ ), it can be
demonstrated that the r.v. (aX + b) is also distributed according to a normal law. In
Xв€’Вµ
particular, the r.v. follows a normal law with parameters (0; 1). This is known as
Пѓ
a standard normal law.
The preceding result can be generalised: if the r.v.s X1 , X2 , . . . , Xn are independent
and normally distributed with E(Xk ) = Вµk , var(Xk ) = Пѓk2 , k = 1, . . . , m, then the r.v.
m
k=1 ak Xk + b will follow a normal law with parameters
пЈ« пЈ¶
m m
пЈ­ ak Пѓk2 пЈё
ak Вµk + b, 2

k=1 k=1

2.2.1.2 Central limit theorem
The importance of this normal law in probability theory and statistics stems from the
well-known central limit theorem, which states that if the r.v.s X1 , X2 , . . . , Xn . . . .:

вЂў are independent;
вЂў have п¬Ѓnite mean Вµk and standard deviation Пѓk (k = 1, . . . , n, . . .);
вЂў and do not have any weighting variance with respect to the whole set limnв†’в€ћ
Пѓk2
= 0 в€Ђk,
Пѓ1 + В· В· В· +Пѓn
2 2
(X1 +В· В· В·+Xn )в€’(Вµ1 +В· В· В·+Вµn )
then the distribution of the r.v., , tends towards a standard
Пѓ1 +В· В· В·+Пѓn
2 2

normal law when n tends towards inп¬Ѓnity.
Probabilistic Concepts 349

Much more intuitively, the central limit theorem states that the sum of a large number
of independent effects, none of which has a signiп¬Ѓcant variability with respect to the set,
is distributed according to the normal law without any hypothesis on the distribution of
the various terms in the sum.

2.2.1.3 Multi-normal distribution
An m-variate random variable (X1 , X2 , . . . , Xm ), is said to be distributed according
to a multi-normal law with parameters (Вµ; V ) if it allows multi-variate density given
1 1
exp в€’ (x в€’ Вµ)t V в€’1 (x в€’ Вµ) , in which Вµ and V
by f (x1 , . . . , xm ) = в€љ
(2ПЂ)m dtm(V ) 2
represent respectively the vector of means and the varianceвЂ“covariance matrix of the r.v.s
Xk (k = 1, . . . , m).
The property of the linear combination of normal independent r.v.s can be generalised
as follows: for a multi-normal random variable X with parameters (Вµ; V ), and a matrix
A that allows an inverse, the m-variate random variable AX + b is itself distributed
according to a multi-normal parameter law (AВµ + b; AVAt ).
For the speciп¬Ѓc case m = 2, the multi-normal density is termed binormal and written
as
2
в€’1 x1 в€’ Вµ1
1
f (x1 , x2 ) = exp
2(1 в€’ ПЃ 2 ) Пѓ1
2ПЂПѓ1 Пѓ2 1 в€’ ПЃ2
2
x1 в€’ Вµ1 x2 в€’ Вµ2 x2 в€’ Вµ2
в€’2ПЃ +
Пѓ1 Пѓ2 Пѓ2

2.2.1.4 Log-normal distribution
Let us now return to a one-dimensional distribution linked to the normal law. A r.v. X is
said to be distributed according to a log-normal law with parameter (Вµ; Пѓ ) when lnX is
normally distributed with the parameters (Вµ; Пѓ ). It can be easily demonstrated that this
r.v. will only take positive values and that it is deп¬Ѓned by the density
2
ln x в€’ Вµ
1 1
f (x) = в€љ exp в€’ (x > 0)
Пѓ
2ПЂПѓ x 2

The graph for this density is shown in Figure A2.12 and its typical values are given by:
Пѓ2
Вµ+
E(X) = e 2

var(X) = e2Вµ+Пѓ (eПѓ в€’ 1)
2 2

Оі1 (X) = (eПѓ + 2) eПѓ 2 в€’ 1
2

Оі2 (X) = (e3Пѓ + 3e2Пѓ + 6eПѓ + 6)(eПѓ в€’ 1)
2 2 2 2

This conп¬Ѓrms the skewness with concentration to the left and the spreading to the right,
observed on the graph.
350 Asset and Risk Management

f(x)

x

Figure A2.12 Log-normal distribution

We would point out п¬Ѓnally that a result of the same type as the central limit theorem
also leads to the log-normal law: this is the case in which the effects represented by the
various r.v.s accumulate through a multiplication model rather than through an addition
model, because of the fundamental property of the logarithms: ln(x1 В· x2 ) = ln x1 + ln x2 .

2.2.2 Other theoretical distributions
2.2.2.1 Poisson distribution
The Poisson r.v., with parameter Вµ, is a discrete X r.v. that takes all the complete positive
integer values 0, 1, 2 etc. with the associated probabilities of:

k
в€’Вµ Вµ
Pr[X = k] = e kв€€N
k!

The typical values for this distribution are given by:

E(X) = Вµ
var(X) = Вµ

2.2.2.2 Binomial distribution
The Bernoulli scheme is a probability model applied to a very wide range of situations.
It is characterised by

вЂў a п¬Ѓnite number of independent trials;
вЂў during each trial, two results only вЂ“ success and failure вЂ“ are possible;
вЂў also during each trial, the probability of a success occurring is the same.

If n is the number of trials and p the probability of each success succeeding, the term
used is Bernoulli scheme with parameters (n; p) and the number of successes out of the
Probabilistic Concepts 351

n tests is a binomial parameter r.v., termed B(n, p). This discrete random variable takes
the values 0, 1, 2, . . . , n with the following associated probabilities:5
n
p k (1 в€’ p)nв€’k
Pr[B(n; p) = k] = k в€€ {0, 1, . . . , n}
k
The sum of these probabilities equals 1, in accordance with NewtonвЂ™s binomial formula.
In addition, the typical values for this distribution are given by:
E(B(n; p)) = np
var(B(n; p)) = np(1 в€’ p)
The binomial distribution allows two interesting approximations when the n parameter is
large. Thus, for a very small p, we have the approximation through PoissonвЂ™s law with
np parameter:
k
в€’np (np)
Pr[B(n; p) = k] в‰€ e
k!
For a p that is not to close to 0 or 1, the binomial r.v. tends towards a normal law with
в€љ
parameters (np; np(1 в€’ p)), and more speciп¬Ѓcally:

kв€’Вµ+ kв€’Вµв€’
1 1
Pr[B(n; p) = k] в‰€ в€’
2 2
Пѓ Пѓ

2.2.2.3 Student distribution
The Student distribution, with n degrees of freedom, is deп¬Ѓned by the density
в€’(ОЅ+1)/2
( ОЅ+1 ) x2
f (x) = 1+
ОЅв€љ
2
ОЅ
( 2 ) ОЅПЂ
+в€ћ
In this expression, the gamma function is deп¬Ѓned by (n) = 0 eв€’x x nв€’1 dx.
This generalises the factorial function as (n) = (n в€’ 1) В· (n в€’ 1) and for integer n,
we have: (n) = (n в€’ 1)!
This is, however, deп¬Ѓned for n values that are not integer: all the positive real values
of n and, for example: в€љ
(1) = ПЂ
2

We are not representing the graph for this density here, as it is symmetrical with respect
to the vertical axis and bears a strong resemblance to the standard normal density graph,
although for ОЅ > 4 the kurtosis coefп¬Ѓcient value is strictly positive:
E(X) = 0
ОЅ
var(X) =
ОЅ в€’2
Оі1 (X) = 0
6
Оі2 (X) =
ОЅ в€’4
p!
n
=
5
Remember that
k p!(n в€’ p)!
352 Asset and Risk Management

Finally, it can be stated that when the number of degrees of freedom tends towards inп¬Ѓnity,
the Student distribution tends towards the standard normal distribution, this asymptotic
property being veriп¬Ѓed in practice as soon as ОЅ reaches the value of 30.

2.2.2.4 Uniform distribution
A r.v. is said to be uniform in the interval [a; b] when the probability of its taking a
value between t and t + h6 depends only on these two boundaries through h. It is easy
to establish, on that basis, that we are looking at a r.v. that only takes a value within the
interval [a; b] and that its density is necessarily constant:

f (x) = 1/(b в€’ a) (a < x < b)

Its graph is shown in Figure A2.13.
The principal typical values for the uniform r.v. are given by:
a+b
E(X) =
2
(a в€’ b)2
var(X) =
12
Оі1 (X) = 0
6
Оі2 (X) = в€’
5
This uniform distribution is the origin of some simulation methods, in which the generation
of random numbers distributed uniformly in the interval [0; 1] allows distributed random
numbers to be obtained according to a given law of probability (Figure A2.14). The way
in which this transformation occurs is explained in Section 7.3.1. Let us examine here
how the (pseudo-) random numbers uniformly distributed in [0; 1] can be obtained.
The sequence x1 , x2 , . . . , xn is constructed according to residue classes. On the basis
of an initial value of ПЃ0 (equal to 1, for example), we can construct for i = 1, 2, . . . ,
n etc.:
xi = decimal part of (c1 ПЃiв€’1 )
ПЃi = c2 xi
Here, the constants c1 and c2 are suitably chosen, Thus, for c1 = 13.3669 and c2 =
94.3795, we п¬Ѓnd successively as shown in Table A2.1:

f(x)

1/(b вЂ“ a)

a b x

Figure A2.13 Uniform distribution

These two values are assumed to belong to the interval [a; b].
6
Probabilistic Concepts 353

1

0

Figure A2.14 Random numbers uniformly distributed in [0; 1]

xI and ПЃI
Table A2.1

i xi ПЃi

0
1 0.366900 34.627839
2 0.866885 81.813352
3 0.580898 55.768652
4 0.453995 42.847849
5 0.742910 70.115509
6 0.226992 21.423384
7 0.364227 34.375527
8 0.494233 46.645452
9 0.505097 47.670759
10 0.210265 19.844676

2.2.2.5 Generalised error distribution
The generalised distribution of errors for parameter ОЅ is deп¬Ѓned by the density
пЈ®пЈ« пЈ¶ОЅ пЈ№
|x|
в€љ
пЈЇпЈ¬ пЈ·пЈє
3ОЅ
f (x) = exp пЈ°в€’ пЈ­ пЈё пЈ».
ОЅ 1
3/2
1
ОЅ
3
2
ОЅ

The graph for this density is shown in Figure A2.15.
This is a distribution symmetrical with respect to 0, which corresponds to a normal
distribution for n = 2 and gives rise to a leptokurtic distribution (resp. negative kurtosis
distribution) for n < 2 (n > 2).

2.3 STOCHASTIC PROCESSES
2.3.1 General considerations
The term stochastic process is applied to a random variable that is a function of the time
variable: {Xt : t в€€ T }.
354 Asset and Risk Management

f(x)
v=1
v=2
v=3

x
0

Figure A2.15 Generalised error distribution

If the set T of times is discrete, the stochastic process is simply a sequence of random
variables. However, in a number of п¬Ѓnancial applications such as Black and ScholesвЂ™
model, it will be necessary to consider stochastic processes in continuous time.
For each possible result П‰ в€€ , the function of Xt (П‰) of the variable t is known as the
path of the stochastic process.
A stochastic process is said to have independent increments when, regardless of the
times t1 < t2 < . . . < tn , the r.v.s

Xt1 , Xt2 в€’ Xt1 , Xt3 в€’ Xt2 , . . .

are independent. In the same way, a stochastic process is said to have stationary increments
when for every t and h the r.v.s Xt+h в€’ Xt and Xh are identically distributed.

2.3.2 Particular stochastic processes
2.3.2.1 The Poisson process
We consider a process of random occurrences of an event in time, corresponding to the
set [0; +в€ћ[. Here, the principal interest does not correspond directly to the occurrence
times, but to the number of occurrences within given intervals. The r.v. that represents
the number of occurrences within the interval [t1 , t2 ] is termed n(t1 , t2 ).
This process is called a Poisson process if it obeys the following hypotheses:

вЂў the numbers of occurrences in separate intervals of time are independent;
вЂў the distribution of the number of occurrences within an interval of time only depends
on that interval through its duration: Pr[n(t1 , t2 ) = k] is a function of (t2 в€’ t1 ), which
is henceforth termed pk (t2 в€’ t1 );
вЂў there is no multiple occurrence: if h is low, Pr[n(0; h) в‰Ґ 2] = o(h);
вЂў there is a rate of occurrence О± so that Pr[n(0; h) = 1] = О±h + o(h).

It can be demonstrated that under these hypotheses, the r.v. вЂ˜number of occurrences
within an interval of duration tвЂ™ is distributed according to a Poisson law for parameter О±t:

(О±t)k
в€’О±t
pk (t) = e k = 0, 1, 2, . . .
k!
Probabilistic Concepts 355

To simplify, we note Xt = n(0; t). This is a stochastic process that counts the number
of occurrences over time. The path for such a process is therefore a stepped function,
with the abscissas for the jumps corresponding to the occurrence times and the heights
of the jumps being equal to 1. It can be demonstrated that the process has independent
and stationary increments and that E(Xt ) = var(Xt ) = О±t.
This process can be generalised as follows. We consider:

вЂў A Poisson process Xt as deп¬Ѓned above; with the time of the k th occurrence expressed
as Tk , we have: Xt = #{k : Tk в‰¤ t}.
вЂў A sequence Y1 , Y2 , . . . of independent and identically distributed r.v.s, independent of
the Poisson process.

The process Zt = {k:Tk в‰¤t} Yk is known as a compound Poisson process.
The paths of such a process are therefore stepped functions, with the abscissas for the
jumps corresponding to the occurrence times for the subjacent Poisson process and the
heights of the jumps being the realised values of the r.v.s Yk . In addition, we have:

E(Zt ) = О±t В· ВµY
var(Zt ) = О±t В· (Пѓ 2 Y + Вµ2 Y )

2.3.2.2 Standard Brownian motion
Consider a sequence of r.v.s Xk , independent and identically distributed, with values
(в€’ X) and X with respective probabilities 1/2 and 1/2, and deп¬Ѓne the sequence of
r.v.s as Yn through Yn = X1 + X2 + В· В· В· + Xn . This is known as a symmetrical random
walk. As E(Xk ) = 0 var(Xk ) = ( X)2 , we have E(Yn ) = 0 var(Yn ) = n( X)2 .
For our modelling requirements, we separate the interval of time [0; t] in n subintervals
of the same duration t = t/n and deп¬Ѓne Zt = Zt(n) = Yn . We have:

( X)2
E(Zt ) = 0 var(Yn ) = n( X) = t.
2
t
This variable Zt allows the discrete development of a magnitude to be modelled. If
we then wish to move to continuous modelling while retaining the same variability per
( X)2
= 1, for example, we obtain the stochastic process
unit of time, that is, with:
t
wt = limnв†’в€ћ Zt(n) .
This is a standard Brownian motion (also known as a Wiener process). It is clear that
this stochastic process wt , deп¬Ѓned on R+ , is such that w0 = 0, that wt has independent
and stationary increments, and that in view of the central limit theorem wt is distributed
в€љ
according to a normal law with parameters (0; t). It can be shown that the paths of a
Wiener process are continuous everywhere, but cannot generally be differentiated. In fact
в€љ
wt Оµ t Оµ
= =в€љ
t t t

where, Оµ is a standard normal r.v.
356 Asset and Risk Management

2.3.2.3 Ito process
Л†
If a more developed model is required, wt can be multiplied by a constant in order to
produce variability per time unit ( X)2 / t different from 1 or to add a constant to it in
order to obtain a non-zero mean:

Xt = X0 + b В· wt

This type of model is not greatly effective because of the great variability of the devel-
в€љ
opment in the short term, the standard deviation of Xt being equal7 to b t.
For this reason, this type of construction is applied more to variations relating to a
short interval of time:
dXt = a В· dt + b В· dwt

It is possible to generalise by replacing the constants a and b by functions of t and Xt :

dXt = at (Xt ) В· dt + bt (Xt ) В· dwt

This type of process is known as the ItЛ† process. In п¬Ѓnancial modelling, several speciп¬Ѓc
o
cases of ItЛ† process are used, and a geometric Brownian motion is therefore obtained when:
o

at (Xt ) = a В· Xt bt (Xt ) = b В· Xt

An OrnsteinвЂ“Uhlenbeck process corresponds to:

at (Xt ) = a В· (c в€’ Xt ) bt (Xt ) = b

and the square root process is such that:
в€љ
at (Xt ) = a В· (c в€’ Xt ) bt (Xt ) = b Xt

2.3.3 Stochastic differential equations
Expressions of the type dXt = at (Xt ) В· dt + bt (Xt ) В· dwt cannot simply be handled in the
same way as the corresponding determinist expressions, because wt cannot be derived.
It is, however, possible to extend the deп¬Ѓnition to a concept of stochastic differential,
through the theory of stochastic integral calculus.8
As the stochastic process zt is deп¬Ѓned within the interval [a; b], the stochastic integral
of zt is deп¬Ѓned within [a; b] with respect to the standard Brownian motion wt by:

nв€’1
b
zt dwt = lim ztk (wtk+1 в€’ wtk )
nв†’в€ћ
a Оґв†’0 k=0

7
The root function presents a vertical tangent at the origin.
8
 << стр. 14(всего 16)СОДЕРЖАНИЕ >>