As for the functions of a single variable, the extrema of a derivable function can be

determined thanks to two conditions.

• The ¬rst-order (necessary) condition states that if x (0) is an extremum of f , then all

the partial derivatives of f will be zero in x (0) :

fxi (x (0) ) = 0 (i = 1, . . . , n)

When referring to the geometric interpretation of the partial derivatives of a function

of two variables, at this type of point (x0 , y0 ), called the stationary point, the tangents

to the curves Cx and Cy are therefore horizontal.

• The second-order (suf¬cient) condition allows the stationary points to be ˜sorted™

according to their nature, but ¬rst and foremost requires de¬nition of the Hessian

matrix of the function f at point x, made up of second partial derivatives of f :

«

fx1 x1 (x) fx1 x2 (x) ··· fx1 xn (x)

¬ ·

¬ f (x) fx2 xn (x) ·

fx2 x2 (x) ···

H (f (x1 , . . . , xn )) = ¬ x2 x1 ·

¬ ·

. . .

. . .

. . .

fxn x1 (x) fxn x2 (x) ··· fxn xn (x)

If x (0) is a stationary point of f and H (f (x)) is p.d. at x (0) or s.p. in a neighbourhood

of x (0) , we have a minimum. In the opposite situation, if H (f (x)) is n.d. at x (0) or s.n.

in a neighbourhood of x (0) , we have a maximum.3

1.2.1.3 Extrema under constraint(s)

This is a similar concept, but one in which the analysis of the problem of extrema is

restricted to those x values that obey one or more constraints.

3

These notions are explained in Section 1.3.2.1 in this Appendix.

Mathematical Concepts 331

(0) (0)

The point (x1 , . . . , xn ) is a local maximum (resp. minimum) of the function f under

±

the constraints

g1 (x) = 0

...

gr (x) = 0

If x (0) veri¬es the constraints itself and

(0) (0)

(0) (0)

f (x1 , . . . , xn ) ≥ f (x1 , . . . , xn ) [resp. f (x1 , . . . , xn ) ¤ f (x1 , . . . , xn )]

for any (x1 , . . . , xn )

(0) (0)

in a neighbourhood of (x1 , · · · , xn )

satisfying the r constraints

Solving this problem involves considering the Lagrangian function of the problem. We

are looking at a function of the (n + r) variables (x1 , . . . , xn ; m1 , . . . , mr ), the latest r

values “ known as Lagrangian multipliers “ each correspond to a constraint:

L(x1 , . . . , xn ; m1 , . . . , mr ) = f (x) + m1 · g1 (x) + · · · + mr · gr (x)

We will not go into the technical details of solving this problem. We will, however, point

out an essential result: if the point (x (0) ; m(0) ) is such that x (0) veri¬es the constraints and

(x (0) ; m(0) ) is a extremum (without constraint) of the Lagrangian function, then x (0) is an

extremum for the problem of extrema under constraints.

1.2.2 Taylor™s formula

Taylor™s formula is also generalised for the n-variable functions, but the degree 1 term,

which reveals the ¬rst derivative, is replaced by n terms with the n partial derivatives:

(0) (0) (0)

fxi (x1 , x2 , . . . , xn ) i = 1, 2, . . . , n

In the same way, the degree 2 term, the coef¬cient of which constitutes the second

derivative, here becomes a set of n2 terms in which the various second partial derivatives

are involved:

(0) (0) (0)

fxi xj (x1 , x2 , . . . , xn ) i, j = 1, 2, . . . , n

Thus, by limiting the writing to the degree 2 terms, Taylor™s formula is written as follows:

n

1

(0) (0) (0) (0)

fxi (x (0) )hi

f (x1 + h1 , x2 + h2 , . . . , x n + hn ) ≈ f (x ) +

1! i=1

n n

1

fxi xj (x (0) )hi hj + · · ·

+

2! i=1 j =1

332 Asset and Risk Management

1.3 MATRIX CALCULUS

1.3.1 De¬nitions

1.3.1.1 Matrices and vectors

The term n-order matrix is given to a set of n2 real numbers making up a square table

consisting of n rows and n columns.4 A matrix is generally represented by a capital letter

(such as A), and its elements by the corresponding lower-case letter (a) with two allocated

indices representing the row and column to which the element belongs: aij is the element

of matrix A located at the intersection of row i and column j within A. Matrix A can

therefore be written generally as follows:

«

a11 a12 ··· a1j ··· a1n

¬ a21 a2n ·

a22 ··· a2j ···

¬ ·

¬. .·

. .

¬. . . .·

. . . .·

A=¬

¬ ai1 · · · ain ·

ai2 ··· aij

¬ ·

¬. .·

. .

. . . .

. . . .

an1 an2 ··· anj · · · ann

In the same way, a vector of n dimension is a set of n real numbers forming a columnar

table. The elements in a vector are its components and are referred to by a single index.

«

x1

¬ x2 ·

¬·

¬.·

¬.·.

X=¬ ·

¬ xi ·

¬·

¬.·

..

xn

1.3.1.2 Speci¬c matrices

The diagonal elements in a matrix are the elements a11 , a22 , . . . , ann . They are located

on the diagonal of the table that starts from the upper left-hand corner; this is known as

the principal diagonal.

A matrix is de¬ned as symmetrical if the elements symmetrical with respect to the

principal diagonal are equal: aij = aji . Here is an example:

«

2 ’3 0

√

A = ’3 2

1

√

0 2 0

More generally, a matrix is a rectangular table with the format (m, n); m rows and n columns. We will, however, only

4

be looking at square matrices here.

Mathematical Concepts 333

An upper triangular matrix is a matrix in which the elements located underneath the

principal diagonal are zero: aij = 0 when i < j . For example:

«

0 2 ’1

A = 0 3 0

00 5

The concept of a lower triangular matrix is of course de¬ned in a similar way.

Finally, a diagonal matrix is one that is both upper triangular and lower triangular. Its

only non-zero elements are the diagonal elements: aji = 0 when i and j are different.

Generally, this type of matrix will be represented by:

«

a1 0 · · · 0

¬ 0 a2 · · · 0 ·

¬ ·

A=¬ . . · = diag (a1 , a2 , . . . , an )

. ..

. . . .

. . .

0 0 · · · an

1.3.1.3 Operations

The sum of two matrices, as well as the multiplication of a matrix by a scalar, are

completely natural operations: the operation in question is carried out for each of the

elements. Thus:

(A + B)ij = aij + bij

(»A)ij = »aij

These de¬nitions are also valid for the vectors:

(X + Y )i = xi + yi

(»X)i = »xi

The product of two matrices A and B is a matrix of the same order as A and B, in which

the element (i, j ) is obtained by calculating the sum of the products of the elements in

line i of A with the corresponding elements in column j in B:

n

(AB)ij = ai1 b1j + ai2 b2j + · · · + ain bnj = aik bkj

k=1

We will have, for example:

«

« «

0 ’1 5 ’2 ’2 10 ’3

2 0

1 · 3 ’1 0 = ’4 17 ’7

3 ’2

6 ’17

’3 0 ’1

2 6

2 0

Despite the apparently complex de¬nition, the matrix product has a number of classical

properties; it is associative and distributive with respect to addition. However, it needs to

be handled with care as it lacks one of the classical properties: it is not commutative. AB

does not equal BA!

334 Asset and Risk Management

The product of a matrix by a vector is de¬ned using the same “lines by columns”

procedure:

n

(AX)i = aik xk

k=1

The transposition of a matrix A is the matrix At , obtained by permuting the symmetrical

elements with respect to the principal diagonal, or, which amounts to the same thing, by

permuting the role of the lines and columns in matrix A:

(At )ij = aji

A matrix is thus symmetrical if, and only if, it is equal to its transposition. In addition

this operation, applied to a vector, gives the corresponding line vector as its result.

The inverse of matrix A is matrix A’1 , if it exists, so that: AA’1 = A’1 A

= diag(1, . . . , 1) = I .

For example, it is easy to verify that:

« ’1 «

1 ’1

10 1 3

’2 1 ’3 = 0 1

0

’2 ’1

01 0 1

Finally, let us de¬ne the trace of a matrix. The trace is the sum of the matrix™s diag-

onal elements: n

tr(A) = a11 + a22 + · · · + ann = aii

i=1

1.3.2 Quadratic forms

1.3.2.1 Quadratic form and class of symmetrical matrix

A quadratic form is a polynomial function with n variables containing only second-

degree terms:

n n

Q(x1 , x2 , . . . , xn ) = aij xi xj

i=1 j =1

If we construct a matrix A from coef¬cients aij (i, j = 1, . . . , n) and the vector X of the

variables xi (i = 1, . . . , n), we can give a matrix expression to the quadratic form:

Q(X) = Xt AX.

In fact, by developing the straight-line member, we produce:

n

t

X AX = xi (AX)i

i=1

n n

= xi aij xj

i=1 j =1

n n

= aij xi xj

i=1 j =1

Mathematical Concepts 335

A quadratic form can always be associated with a matrix A, and vice versa. The matrix,

however, is not unique. In fact, the quadratic form Q(x1 , x2 ) = 3x1 ’ 4x1 x2 can be associ-

2

3 ’2 3 ’6

30

ated with matrices A = B= C= , as well

’2 ’4 0

0 2 0

as in¬nite number of others. Amongst all these matrices, only one is symmetrical (A in

the example given). There is therefore bijection between all the quadratic forms and all

the symmetrical matrices.

The class of a symmetrical matrix is de¬ned on the basis of the sign of the associated

quadratic form. Thus, the non-zero matrix A is said to be positive de¬nite (p.d.) if Xt AX >

0 for any X not equal to 0, and semi-positive (s.p.) when:

Xt AX ≥ 0 for any X = 0

there is one Y = 0 so that Y t AY = 0

A matrix is negative de¬nite (n.d.) and semi-negative (s.n.) by the inverse inequalities,

and the term non-de¬nite is given to a symmetrical matrix for which there are some X

and Y = 0 so that Xt AX > 0 and Y t AY < 0.

«

’3 ’4

5

The symmetrical matrix A = ’3 2 is thus p.d., as the associated quadratic

10

’4 2 8

form can be written as:

Q(x, y, z) = 5x 2 + 10y 2 + 8z2 ’ 6xy ’ 8xz + 4yz

= (x ’ 3y)2 + (2x ’ 2z)2 + (y + 2z)2

This form will never be negative, and simply cancels out when:

x ’ 3y = 0

2x ’ 2z = 0

y + 2z = 0

That is, when x = y = z = 0.

1.3.2.2 Linear equation system

A system of n linear equations with n unknowns is a set of relations of the following type:

±

a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2

···

a x + a x + ··· + a x = b

n1 1 n2 2 nn n n

In it, the aij , xj and bi are respectively the coef¬cients, the unknowns and the second

members. They are written naturally in both matrix and vectorial form: A, X and B.

Using this notation, the system is written in an equivalent but more condensed way:

AX = B

336 Asset and Risk Management

For example, the system of equations

2x + 3y = 4

4x ’ y = ’2

can also be written as:

x

2 3 4

=

4 ’1 y ’2

If the inverse of matrix A exists, it can easily be seen that the system admits one and just

one solution, given as X = A’1 X.

1.3.2.3 Case of variance“covariance matrix5

«2

σ1 σ12 · · · σ1n

¬ σ21 σ2 · · · σ2n ·

2

¬ ·

The matrix V = ¬ . . ·, for the variances and covariances of a number

. ..

. . .

.

. . .

σn1 σn2 · · · σn 2

of random variables X1 , X2 , . . . , Xn is a matrix that is either p.d. or s.p.

In effect, regardless of what the numbers »1 , »2 , . . . , »n are, not all zero and making

up the vector , we have:

n n n

t

V = »i »j σij = var »i Xi ≥0

i=1 j =1 i=1

It can even be said, according to this result, that the variance“covariance matrix V is

p.d. except when there are coef¬cients »1 , »2 , . . . , »n that are not all zero, so that the

random variable »1 X1 + · · · + »n Xn = n »i Xi is degenerate, in which case V will be

i=1

s.p. This degeneration may occur, for example, when:

• one of the variables is degenerate;

• some variables are perfectly correlated;

• the matrix V is obtained on the basis of observations of a number strictly lower than

the number of variables.

It will then be evident that the variance“covariance matrix can be expressed as a matrix,

through the relation:

V = E[(X ’ µ)(X ’ µ)t ]

1.3.2.4 Choleski factorisation

Consider a symmetrical matrix A positive de¬nite. It can be demonstrated that there exists

a lower triangular matrix L with strictly positive diagonal elements so that A = LLt .

5

The concepts necessary for an understanding of this example are shown in Appendix 2.

Mathematical Concepts 337

This factorisation process is known as a Choleski factorisation. We will not be demon-

strating this property, but will show, using the previous example, how the matrix L

is found:

« « «2

a ab ad

a00 abd

LLt = b c 0 0 c f = ab b2 + c2 bd + cf

dfg 00g ad bd + cf d + f 2 + g 2

2

«

’3 ’4

5

= A = ’3 2

10

’4 2 8

It is then suf¬cient to work the last equality in order to ¬nd a, b, c, d, f and g in

succession, which will give the following for matrix L.

«√

5 0 0

√ √

¬ ·

¬35 ·

205

¬ 0·

L = ¬’ 5 ·

5

¬ √·

4√5 √

14 41

2 205

’ ’

5 205 41

Appendix 2

Probabilistic Concepts1

2.1 RANDOM VARIABLES

2.1.1 Random variables and probability law

2.1.1.1 De¬nitions

Let us consider a fortuitous phenomenon, that is, a phenomenon that under given initial

conditions corresponds to several possible outcomes. A numerical magnitude that depends

on the observed result is known as a random variable or r.v.

In addition, probabilities are associated with various possible results or events de¬ned

in the context of the fortuitous phenomenon. It is therefore interesting to ¬nd out the

probabilities of the various events de¬ned on the basis of the r.v. What we are looking at

here is the concept of law of probability of the r.v. Thus, if the r.v. is termed X, the law

of probability of X is de¬ned by the range of the following probabilities: Pr[X ∈ A], for

every subset A of R.

The aim of the concept of probability law is a bold one: the subsets A of R are

too numerous for all the probabilities to be known. For this reason, we are content to

work with just the ]’∞; t] sets. This therefore de¬nes a function of the variable t,

the cumulative distribution function or simplier distribution function (d.f.) of the random

variable F (t) = Pr[X ¤ t].

It can be demonstrated that this function, de¬ned in R, is increasing, that it is between 0

1

and 1, that it admits the ordinates 0 and 1 as horizontal asymptotics limt’±∞ F (t) = ,

0

and that it is right-continuous: lims’t+ F (s) = F (t).

These properties are summarised in Figure A2.1.

In addition, despite its simplicity, the d.f. allows almost the whole of the probability

law for X to be found, thus:

Pr[s < X ¤ t] = F (t) ’ F (s)

Pr[X = t] = F (t) ’ F (t’)

2.1.1.2 Quantile

Sometimes there is a need to solve the opposite problem: being aware of a probability

level u and determining the value of t so that F (t) = Pr[X ¤ t] = u.

This value is known as the quantile of the r.v. X at point u and its de¬nition are shown

in Figure A2.2.

1

Readers wishing to ¬nd out more about these concepts should read: Baxter M. and Rennie A., Financial Calculus,

Cambridge University Press, 1996. Feller W., An Introduction to Probability Theory and its Applications (2 volumes), John

Wiley and Sons, Inc., 1968. Grimmett G. and Stirzaker D., Probability and Random Processes, Oxford University Press,

1992. Roger P., Les outils de la mod´ lisation ¬nanci` re, Presses Universitaires de France, 1991. Ross S. M., Initiation aux

e e

probabilit´ s, Press Polytechniques et Universitaires Romandes, 1994.

e

340 Asset and Risk Management

F(t)

1

t

0

Figure A2.1 Distribution function

F(t)

1

u

t

0 Q(u)

Figure A2.2 Quantile

F(t)

1

u

0 Q(u) t

Figure A2.3 Quantile in jump scenario

In two cases, however, the de¬nition that we have just given is unsuitable and needs to

be adapted. First of all, if the d.f. of X shows a jump that covers the ordinate u, none of

the abscissas will correspond to it and the abscissa of the jump, naturally, will be chosen

(see Figure A2.3).

Next, if the ordinate u corresponds to a plateau on the d.f. graph, there is an in¬nite

number of abscissas on the abscissa to choose from (see Figure A2.4).

In this case, an abscissa de¬ned by the relation Q(u) = um + (1 ’ u)M can be chosen.

The quantile function thus de¬ned generalises the concept of the reciprocal function of

the d.f.

2.1.1.3 Discrete random variable

A discrete random variable corresponds to a situation in which the set of possible values

for the variable is ¬nite or in¬nite countable. In this case, if the various possible values

Probabilistic Concepts 341

F(t)

1

u

m Q(u) M

0

Figure A2.4 Quantile in plateau scenario

and corresponding probabilities are known

x1 x2 ··· xn ···

p1 p2 ··· pn ···

Pr[X = xi ] = pi i = 1, 2, . . . , n, . . .

pi = 1

i

The law of probability of X can be easily determined:

Pr[X ∈ A] = pi

{i:xi ∈A}

The d.f. of a discrete r.v. is a stepped function, as the abscissas of jumps correspond to

the various possible values of X and the heights of the jumps are equal to the associated

probabilities (see Figure A2.5).

In particular, a r.v. is de¬ned as degenerate if it can only take on one value x (also

referred to as a certain variable): Pr[X = x] = 1.

The d.f. for a degenerate variable will be 0 to the left of x and 1 from x onwards.

2.1.1.4 Continuous random variable

In contrast to the discrete r.v., the set of possible values for a r.v. could be continuous

(an interval, for example) with no individual value having a strictly positive probability:

Pr[X = x] = 0 ∀x

F(t)

1

p4

x4

0

Figure A2.5 Distribution function for a discrete random variable

342 Asset and Risk Management

f

x x+h

Figure A2.6 Probability density

In this case, the distribution of probabilities over the set of possible values is expressed

using a density function f : for a suf¬ciently small h, we will have Pr[x < X ¤ x + h] ≈

hf (x).

This de¬nition is shown in Figure A2.6.

The law of probability is obtained from the density through the following relation:

Pr[X ∈ A] = f (x) dx

A

And as a particular case:

t

F (t) = f (x) dx

’∞

2.1.1.5 Multivariate random variables

Often there is a need to consider several r.v.s simultaneously X1 , X2 , . . . , Xm , associated

with the same fortuitous phenomenon.2 Here, we will simply show the theory for a bivari-

ate random variable, that is, a pair of r.v.s (X, Y ); the general process for a multivariate

random variable can easily be deduced from this.

The law of probability for a bivariate random variable is de¬ned as the set of the

following probabilities: Pr [(X, Y ) ∈ A], for every subset A of R2 . The joint distribu-

tion function is de¬ned F (s, t) = Pr([X ¤ s] © [Y ¤ t]) and the discrete and continuous

bivariate random variables are de¬ned respectively by:

pij = Pr([X = xi ] © [Y = yj ])

Pr[(X, Y ) ∈ A] = f (x, y) dx dy

A

Two r.v.s are de¬ned as independent when they are not in¬‚uenced either from the point

of view of possible values or through the probability of the events that they de¬ne. More

formally, X and Y are independent when:

Pr([X ∈ A] © [Y ∈ B]) = Pr[X ∈ A] · Pr[Y ∈ B]

for every A and B in R.

2

For example, the return on various ¬nancial assets.

Probabilistic Concepts 343

It can be shown that two r.v.s are independent if, and only if, their joint d.f. is equal to the

product of the d.f.s of each of the r.v.s: F (s, t) = FX (s) · FY (t). And that this condition,

for discrete or continuous random variables, shows as:

pij = Pr[X = xi ] · Pr[Y = yj ]

f (x, y) = fX (x) · fY (y)

2.1.2 Typical values of random variables

The aim of the typical values of a r.v. is to summarise the information contained in

its probability law in a number of representative parameters: parameters of location,

dispersion, skewness and kurtosis. We will be looking at one from each group.

2.1.2.1 Mean

The mean is a central value that locates a r.v. by dividing the d.f. into two parts with the

same area (see Figure A2.7).

The mean µ of the r.v. X is therefore such that:

µ +∞

F (t) dt = [1 ’ F (t)] dt

’∞ µ

The mean of a r.v. can be calculated on the basis of the d.f.:

+∞ 0

µ= [1 ’ F (t)] dt ’ F (t) dt

’∞

0

the formula reducing for a positive r.v. as follows:

+∞

µ= [1 ’ F (t)] dt

0

It is possible to demonstrate that for a discrete r.v. and a continuous r.v., we have

the formulae:

µ= xi pi

i

+∞

µ= xf (x) dx

’∞

Figure A2.7 Mean of a random variable

344 Asset and Risk Management

The structure of these two formulae shows that µ integrates the various possible values

for the r.v. X by weighting them through the probabilities associated with these values.

It can be shown3 that these formulae generalised into an abstract integral of X(ω) with

of the possible outcomes ω of the

respect to the measure of probability Pr in the set

fortuitous phenomenon. This integral is known as the expectation of the r.v. X:

E(X) = X(ω)d Pr(ω)

According to the foregoing, there is equivalence between the concepts of expectation

and mean (E(X) = µ) and we will interchange both these terms from now on.

The properties of the integral show that the expectation is a linear operator:

E(aX + bY + c) = aE(X) + bE(Y ) + c

And that if X and Y are independent, them E(XY ) = E(X) · E(Y ).

In addition, for a discrete r.v. or a continuous r.v., the expectation of a function of a

r.v. variable is given by:

E(g(X)) = g(xi )pi

i

+∞

E(g(X)) = g(x)f (x) dx

’∞

Let us remember ¬nally the law of large numbers,4 which for a sequence of independent

r.v.s X1 , X2 , . . . , Xn with identical distribution and a mean µ, expresses that regardless

of what µ > 0 may be

X1 + X2 + · · · + Xn

’µ ¤µ =1

lim Pr

n

n’∞

This law justi¬es taking the average of a sample to estimate the mean of the pop-

ulation and in particular estimating the probability of an event through the frequency

of that event™s occurrence when a large number of realisations of the fortuitous phe-

nomenon occur.

2.1.2.2 Variance and standard deviation

One of the most commonly used dispersion indices (that is, a measurement of the spread

of the r.v.s values around its mean) is the variance σ 2 , de¬ned as:

σ 2 = var(X) = E[(X ’ µ)2 ]

3

This development is part of measure theory, which is outside the scope of this work. Readers are referred to Loeve M.,

Probability Theory (2 volumes), Springer-Verlag, 1977.

4

We are showing this law in its weak form here.

Probabilistic Concepts 345

f

x

Figure A2.8 Variance of a random variable

By developing the right member, we can therefore arrive at the variance

σ 2 = E(X2 ) ’ µ2

For a discrete r.v. and a continuous r.v., this will give:

σ2 = (xi ’ µ)2 pi = xi 2 pi ’ µ2

i i

+∞ +∞

σ= (x ’ µ) f (x) dx = x 2 f (x) dx ’ µ2

2 2

’∞ ’∞

An example of the interpretation of this parameter is found in Figure A2.8.

It can be demonstrated that var(aX + b) = a 2 var(X). And that if X and Y are inde-

pendent, then var(X + Y ) = var(X) + var(Y ).

Alongside the variance, the dimension of which is the square of the dimension of X,

we can also use the standard deviation, which is simply the square root:

σ= var(X)

2.1.2.3 Fisher™s skewness and kurtosis coef¬cients

Fisher™s skewness coef¬cient is de¬ned by:

E[(X ’ µ)3 ]

γ1 =

σ3

It is interpreted essentially on the basis of its sign: if γ1 > 0 (resp. <0), the distribution

of X will be concentrated to the left (resp. the right) and spread out to the right (resp. the

left). For a symmetrical distribution, γ1 = 0. This interpretation is shown in Figure A2.9.

Fisher™s kurtosis coef¬cient is given by:

E[(X ’ µ)4 ]

γ2 = ’3

σ4

It is interpreted by comparison with the normal distribution (see Section A.2.2.1). This

distribution has a kurtosis coef¬cient of 0. Distributions with higher kurtosis than the

normal law (also termed leptokurtic) are more pointed in the neighbourhood of their

346 Asset and Risk Management

f(x) γ1 = 3.5

γ1 = 0

γ1 = “3.5

0 x

Figure A2.9 Skewness coef¬cient of a random variable

f(x) γ2 = 3

γ2 = “0.6

x

0

Figure A2.10 Kurtosis coef¬cient of a random variable

mean and present fatter tails (and are therefore less important for intermediate values)

than normal distribution; they are characterised by a positive γ2 parameter. Of course, the

distributions with lower kurtosis have a negative kurtosis coef¬cient (see Figure A2.10).

For discrete or continuous r.v.s, the formulae that allows E(g(X)) to be calculated are

used as usual.

2.1.2.4 Covariance and correlation

We now come to the parameters relative to the bivariate random variables. Covariance

between two r.v.s, X and Y , is de¬ned by: σXY = cov(X, Y ) = E[(X ’ µX )(Y ’ µY )]

and can also be calculated by cov (X, Y ) = E(XY ) ’ µX µY .

For discrete r.v.s and continuous r.v.s, the covariance is calculated by:

cov(X, Y ) = (xi ’ µX )(yj ’ µY )pij

i j

= xi yj pij ’ µX µY

i j

+∞ +∞

cov(X, Y ) = (x ’ µX )(y ’ µY )f (x, y) dx dy

’∞ ’∞

+∞ +∞

= xyf (x, y) dx dy ’ µX µY

’∞ ’∞

Probabilistic Concepts 347

The covariance is interpreted as follows: we are looking at the degree of linear connection

that exists between the two r.v.s. A positive covariance thus corresponds to values of the

product (X ’ µX ) (Y ’ µY ) that will be mostly positive and the two factors will be mostly

of the same sign. High values for X (greater than µX ) will correspond to high values for

Y (greater than µY ) and low values for X will correspond to low values for Y . The same

type of reasoning also applies to negative covariance.

It can be demonstrated that:

cov(aX + bY + c, Z) = a cov(X, Z) + b cov(Y, Z)

cov(X, X) = var(X)

E(XY ) = E(X) · E(Y ) + cov(X, Y )

var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y )

and that if X and Y are independent, their covariance is zero. In this case, in fact:

cov(X, Y ) = E[(X ’ µX )(Y ’ µY )]

= E(X ’ µX )E(Y ’ µY )

= (E(X) ’ µX )(E(Y ) ’ µY )

=0

Another parameter, which measures the degree of linear connection between the two

r.v.s is the correlation coef¬cient:

σXY

ρXY = corr(X, Y ) =

σX · σY

The interest in the correlation coef¬cient in comparison to covariance is that we are

looking at a number without dimension, while the covariance measurement unit is equal

to the product of the units of the two r.v.s. Also, the correlation coef¬cient can only assume

values between ’1 and 1 and these two extreme values correspond to the existence of a

perfect linear relation (increasing or decreasing depending on whether ρ = 1 or ρ = ’1)

between the two r.v.s.

Two r.v.s whose correlation coef¬cient (or covariance) is zero are termed non-correlated.

It has been said earlier that independent r.v.s are noncorrelated, but the inverse is not true!

The independence of two r.v.s in fact excludes the existence of any relation between the

variables, while noncorrelation simply excludes the existence of a linear relation.

2.2 THEORETICAL DISTRIBUTIONS

2.2.1 Normal distribution and associated ones

2.2.1.1 Normal distribution

Remember that a normal random variable with parameters (µ; σ ) is de¬ned by its density:

1 x’µ 2

1

f (x) = √ exp ’ , which is shown graphically in Figure A2.11.

σ

2

2πσ

348 Asset and Risk Management

f(x)

0 µ x

Figure A2.11 Normal density

The normal density graph is symmetrical with respect to the vertical straight line of

abscissa µ and shows two points of in¬‚exion, at (µ ’ σ ) and (µ + σ ).

The typical values for this distribution are given by:

E(X) = µ

var (X) = σ 2

γ1 (X) = 0

γ2 (X) = 0

If the r.v. X is distributed following a normal law with parameters (µ; σ ), it can be

demonstrated that the r.v. (aX + b) is also distributed according to a normal law. In

X’µ

particular, the r.v. follows a normal law with parameters (0; 1). This is known as

σ

a standard normal law.

The preceding result can be generalised: if the r.v.s X1 , X2 , . . . , Xn are independent

and normally distributed with E(Xk ) = µk , var(Xk ) = σk2 , k = 1, . . . , m, then the r.v.

m

k=1 ak Xk + b will follow a normal law with parameters

«

m m

ak σk2

ak µk + b, 2

k=1 k=1

2.2.1.2 Central limit theorem

The importance of this normal law in probability theory and statistics stems from the

well-known central limit theorem, which states that if the r.v.s X1 , X2 , . . . , Xn . . . .:

• are independent;

• have ¬nite mean µk and standard deviation σk (k = 1, . . . , n, . . .);

• and do not have any weighting variance with respect to the whole set limn’∞

σk2

= 0 ∀k,

σ1 + · · · +σn

2 2

(X1 +· · ·+Xn )’(µ1 +· · ·+µn )

then the distribution of the r.v., , tends towards a standard

σ1 +· · ·+σn

2 2

normal law when n tends towards in¬nity.

Probabilistic Concepts 349

Much more intuitively, the central limit theorem states that the sum of a large number

of independent effects, none of which has a signi¬cant variability with respect to the set,

is distributed according to the normal law without any hypothesis on the distribution of

the various terms in the sum.

2.2.1.3 Multi-normal distribution

An m-variate random variable (X1 , X2 , . . . , Xm ), is said to be distributed according

to a multi-normal law with parameters (µ; V ) if it allows multi-variate density given

1 1

exp ’ (x ’ µ)t V ’1 (x ’ µ) , in which µ and V

by f (x1 , . . . , xm ) = √

(2π)m dtm(V ) 2

represent respectively the vector of means and the variance“covariance matrix of the r.v.s

Xk (k = 1, . . . , m).

The property of the linear combination of normal independent r.v.s can be generalised

as follows: for a multi-normal random variable X with parameters (µ; V ), and a matrix

A that allows an inverse, the m-variate random variable AX + b is itself distributed

according to a multi-normal parameter law (Aµ + b; AVAt ).

For the speci¬c case m = 2, the multi-normal density is termed binormal and written

as

2

’1 x1 ’ µ1

1

f (x1 , x2 ) = exp

2(1 ’ ρ 2 ) σ1

2πσ1 σ2 1 ’ ρ2

2

x1 ’ µ1 x2 ’ µ2 x2 ’ µ2

’2ρ +

σ1 σ2 σ2

2.2.1.4 Log-normal distribution

Let us now return to a one-dimensional distribution linked to the normal law. A r.v. X is

said to be distributed according to a log-normal law with parameter (µ; σ ) when lnX is

normally distributed with the parameters (µ; σ ). It can be easily demonstrated that this

r.v. will only take positive values and that it is de¬ned by the density

2

ln x ’ µ

1 1

f (x) = √ exp ’ (x > 0)

σ

2πσ x 2

The graph for this density is shown in Figure A2.12 and its typical values are given by:

σ2

µ+

E(X) = e 2

var(X) = e2µ+σ (eσ ’ 1)

2 2

γ1 (X) = (eσ + 2) eσ 2 ’ 1

2

γ2 (X) = (e3σ + 3e2σ + 6eσ + 6)(eσ ’ 1)

2 2 2 2

This con¬rms the skewness with concentration to the left and the spreading to the right,

observed on the graph.

350 Asset and Risk Management

f(x)

x

Figure A2.12 Log-normal distribution

We would point out ¬nally that a result of the same type as the central limit theorem

also leads to the log-normal law: this is the case in which the effects represented by the

various r.v.s accumulate through a multiplication model rather than through an addition

model, because of the fundamental property of the logarithms: ln(x1 · x2 ) = ln x1 + ln x2 .

2.2.2 Other theoretical distributions

2.2.2.1 Poisson distribution

The Poisson r.v., with parameter µ, is a discrete X r.v. that takes all the complete positive

integer values 0, 1, 2 etc. with the associated probabilities of:

k

’µ µ

Pr[X = k] = e k∈N

k!

The typical values for this distribution are given by:

E(X) = µ

var(X) = µ

2.2.2.2 Binomial distribution

The Bernoulli scheme is a probability model applied to a very wide range of situations.

It is characterised by

• a ¬nite number of independent trials;

• during each trial, two results only “ success and failure “ are possible;

• also during each trial, the probability of a success occurring is the same.

If n is the number of trials and p the probability of each success succeeding, the term

used is Bernoulli scheme with parameters (n; p) and the number of successes out of the

Probabilistic Concepts 351

n tests is a binomial parameter r.v., termed B(n, p). This discrete random variable takes

the values 0, 1, 2, . . . , n with the following associated probabilities:5

n

p k (1 ’ p)n’k

Pr[B(n; p) = k] = k ∈ {0, 1, . . . , n}

k

The sum of these probabilities equals 1, in accordance with Newton™s binomial formula.

In addition, the typical values for this distribution are given by:

E(B(n; p)) = np

var(B(n; p)) = np(1 ’ p)

The binomial distribution allows two interesting approximations when the n parameter is

large. Thus, for a very small p, we have the approximation through Poisson™s law with

np parameter:

k

’np (np)

Pr[B(n; p) = k] ≈ e

k!

For a p that is not to close to 0 or 1, the binomial r.v. tends towards a normal law with

√

parameters (np; np(1 ’ p)), and more speci¬cally:

k’µ+ k’µ’

1 1

Pr[B(n; p) = k] ≈ ’

2 2

σ σ

2.2.2.3 Student distribution

The Student distribution, with n degrees of freedom, is de¬ned by the density

’(ν+1)/2

( ν+1 ) x2

f (x) = 1+

ν√

2

ν

( 2 ) νπ

+∞

In this expression, the gamma function is de¬ned by (n) = 0 e’x x n’1 dx.

This generalises the factorial function as (n) = (n ’ 1) · (n ’ 1) and for integer n,

we have: (n) = (n ’ 1)!

This is, however, de¬ned for n values that are not integer: all the positive real values

of n and, for example: √

(1) = π

2

We are not representing the graph for this density here, as it is symmetrical with respect

to the vertical axis and bears a strong resemblance to the standard normal density graph,

although for ν > 4 the kurtosis coef¬cient value is strictly positive:

E(X) = 0

ν

var(X) =

ν ’2

γ1 (X) = 0

6

γ2 (X) =

ν ’4

p!

n

=

5

Remember that

k p!(n ’ p)!

352 Asset and Risk Management

Finally, it can be stated that when the number of degrees of freedom tends towards in¬nity,

the Student distribution tends towards the standard normal distribution, this asymptotic

property being veri¬ed in practice as soon as ν reaches the value of 30.

2.2.2.4 Uniform distribution

A r.v. is said to be uniform in the interval [a; b] when the probability of its taking a

value between t and t + h6 depends only on these two boundaries through h. It is easy

to establish, on that basis, that we are looking at a r.v. that only takes a value within the

interval [a; b] and that its density is necessarily constant:

f (x) = 1/(b ’ a) (a < x < b)

Its graph is shown in Figure A2.13.

The principal typical values for the uniform r.v. are given by:

a+b

E(X) =

2

(a ’ b)2

var(X) =

12

γ1 (X) = 0

6

γ2 (X) = ’

5

This uniform distribution is the origin of some simulation methods, in which the generation

of random numbers distributed uniformly in the interval [0; 1] allows distributed random

numbers to be obtained according to a given law of probability (Figure A2.14). The way

in which this transformation occurs is explained in Section 7.3.1. Let us examine here

how the (pseudo-) random numbers uniformly distributed in [0; 1] can be obtained.

The sequence x1 , x2 , . . . , xn is constructed according to residue classes. On the basis

of an initial value of ρ0 (equal to 1, for example), we can construct for i = 1, 2, . . . ,

n etc.:

xi = decimal part of (c1 ρi’1 )

ρi = c2 xi

Here, the constants c1 and c2 are suitably chosen, Thus, for c1 = 13.3669 and c2 =

94.3795, we ¬nd successively as shown in Table A2.1:

f(x)

1/(b “ a)

a b x

Figure A2.13 Uniform distribution

These two values are assumed to belong to the interval [a; b].

6

Probabilistic Concepts 353

1

0

Figure A2.14 Random numbers uniformly distributed in [0; 1]

xI and ρI

Table A2.1

i xi ρi

0

1 0.366900 34.627839

2 0.866885 81.813352

3 0.580898 55.768652

4 0.453995 42.847849

5 0.742910 70.115509

6 0.226992 21.423384

7 0.364227 34.375527

8 0.494233 46.645452

9 0.505097 47.670759

10 0.210265 19.844676

2.2.2.5 Generalised error distribution

The generalised distribution of errors for parameter ν is de¬ned by the density

®« ν

|x|

√

¬ ·

3ν

f (x) = exp °’ ».

ν 1

3/2

1

ν

3

2

ν

The graph for this density is shown in Figure A2.15.

This is a distribution symmetrical with respect to 0, which corresponds to a normal

distribution for n = 2 and gives rise to a leptokurtic distribution (resp. negative kurtosis

distribution) for n < 2 (n > 2).

2.3 STOCHASTIC PROCESSES

2.3.1 General considerations

The term stochastic process is applied to a random variable that is a function of the time

variable: {Xt : t ∈ T }.

354 Asset and Risk Management

f(x)

v=1

v=2

v=3

x

0

Figure A2.15 Generalised error distribution

If the set T of times is discrete, the stochastic process is simply a sequence of random

variables. However, in a number of ¬nancial applications such as Black and Scholes™

model, it will be necessary to consider stochastic processes in continuous time.

For each possible result ω ∈ , the function of Xt (ω) of the variable t is known as the

path of the stochastic process.

A stochastic process is said to have independent increments when, regardless of the

times t1 < t2 < . . . < tn , the r.v.s

Xt1 , Xt2 ’ Xt1 , Xt3 ’ Xt2 , . . .

are independent. In the same way, a stochastic process is said to have stationary increments

when for every t and h the r.v.s Xt+h ’ Xt and Xh are identically distributed.

2.3.2 Particular stochastic processes

2.3.2.1 The Poisson process

We consider a process of random occurrences of an event in time, corresponding to the

set [0; +∞[. Here, the principal interest does not correspond directly to the occurrence

times, but to the number of occurrences within given intervals. The r.v. that represents

the number of occurrences within the interval [t1 , t2 ] is termed n(t1 , t2 ).

This process is called a Poisson process if it obeys the following hypotheses:

• the numbers of occurrences in separate intervals of time are independent;

• the distribution of the number of occurrences within an interval of time only depends

on that interval through its duration: Pr[n(t1 , t2 ) = k] is a function of (t2 ’ t1 ), which

is henceforth termed pk (t2 ’ t1 );

• there is no multiple occurrence: if h is low, Pr[n(0; h) ≥ 2] = o(h);

• there is a rate of occurrence ± so that Pr[n(0; h) = 1] = ±h + o(h).

It can be demonstrated that under these hypotheses, the r.v. ˜number of occurrences

within an interval of duration t™ is distributed according to a Poisson law for parameter ±t:

(±t)k

’±t

pk (t) = e k = 0, 1, 2, . . .

k!

Probabilistic Concepts 355

To simplify, we note Xt = n(0; t). This is a stochastic process that counts the number

of occurrences over time. The path for such a process is therefore a stepped function,

with the abscissas for the jumps corresponding to the occurrence times and the heights

of the jumps being equal to 1. It can be demonstrated that the process has independent

and stationary increments and that E(Xt ) = var(Xt ) = ±t.

This process can be generalised as follows. We consider:

• A Poisson process Xt as de¬ned above; with the time of the k th occurrence expressed

as Tk , we have: Xt = #{k : Tk ¤ t}.

• A sequence Y1 , Y2 , . . . of independent and identically distributed r.v.s, independent of

the Poisson process.

The process Zt = {k:Tk ¤t} Yk is known as a compound Poisson process.

The paths of such a process are therefore stepped functions, with the abscissas for the

jumps corresponding to the occurrence times for the subjacent Poisson process and the

heights of the jumps being the realised values of the r.v.s Yk . In addition, we have:

E(Zt ) = ±t · µY

var(Zt ) = ±t · (σ 2 Y + µ2 Y )

2.3.2.2 Standard Brownian motion

Consider a sequence of r.v.s Xk , independent and identically distributed, with values

(’ X) and X with respective probabilities 1/2 and 1/2, and de¬ne the sequence of

r.v.s as Yn through Yn = X1 + X2 + · · · + Xn . This is known as a symmetrical random

walk. As E(Xk ) = 0 var(Xk ) = ( X)2 , we have E(Yn ) = 0 var(Yn ) = n( X)2 .

For our modelling requirements, we separate the interval of time [0; t] in n subintervals

of the same duration t = t/n and de¬ne Zt = Zt(n) = Yn . We have:

( X)2

E(Zt ) = 0 var(Yn ) = n( X) = t.

2

t

This variable Zt allows the discrete development of a magnitude to be modelled. If

we then wish to move to continuous modelling while retaining the same variability per

( X)2

= 1, for example, we obtain the stochastic process

unit of time, that is, with:

t

wt = limn’∞ Zt(n) .

This is a standard Brownian motion (also known as a Wiener process). It is clear that

this stochastic process wt , de¬ned on R+ , is such that w0 = 0, that wt has independent

and stationary increments, and that in view of the central limit theorem wt is distributed

√

according to a normal law with parameters (0; t). It can be shown that the paths of a

Wiener process are continuous everywhere, but cannot generally be differentiated. In fact

√

wt µ t µ

= =√

t t t

where, µ is a standard normal r.v.

356 Asset and Risk Management

2.3.2.3 Ito process

ˆ

If a more developed model is required, wt can be multiplied by a constant in order to

produce variability per time unit ( X)2 / t different from 1 or to add a constant to it in

order to obtain a non-zero mean:

Xt = X0 + b · wt

This type of model is not greatly effective because of the great variability of the devel-

√

opment in the short term, the standard deviation of Xt being equal7 to b t.

For this reason, this type of construction is applied more to variations relating to a

short interval of time:

dXt = a · dt + b · dwt

It is possible to generalise by replacing the constants a and b by functions of t and Xt :

dXt = at (Xt ) · dt + bt (Xt ) · dwt

This type of process is known as the Itˆ process. In ¬nancial modelling, several speci¬c

o

cases of Itˆ process are used, and a geometric Brownian motion is therefore obtained when:

o

at (Xt ) = a · Xt bt (Xt ) = b · Xt

An Ornstein“Uhlenbeck process corresponds to:

at (Xt ) = a · (c ’ Xt ) bt (Xt ) = b

and the square root process is such that:

√

at (Xt ) = a · (c ’ Xt ) bt (Xt ) = b Xt

2.3.3 Stochastic differential equations

Expressions of the type dXt = at (Xt ) · dt + bt (Xt ) · dwt cannot simply be handled in the

same way as the corresponding determinist expressions, because wt cannot be derived.

It is, however, possible to extend the de¬nition to a concept of stochastic differential,

through the theory of stochastic integral calculus.8

As the stochastic process zt is de¬ned within the interval [a; b], the stochastic integral

of zt is de¬ned within [a; b] with respect to the standard Brownian motion wt by:

n’1

b

zt dwt = lim ztk (wtk+1 ’ wtk )

n’∞

a δ’0 k=0

7

The root function presents a vertical tangent at the origin.

8