. 14
( 16)


Figure A1.3 Geometric interpretation of partial derivatives

As for the functions of a single variable, the extrema of a derivable function can be
determined thanks to two conditions.

• The ¬rst-order (necessary) condition states that if x (0) is an extremum of f , then all
the partial derivatives of f will be zero in x (0) :

fxi (x (0) ) = 0 (i = 1, . . . , n)

When referring to the geometric interpretation of the partial derivatives of a function
of two variables, at this type of point (x0 , y0 ), called the stationary point, the tangents
to the curves Cx and Cy are therefore horizontal.
• The second-order (suf¬cient) condition allows the stationary points to be ˜sorted™
according to their nature, but ¬rst and foremost requires de¬nition of the Hessian
matrix of the function f at point x, made up of second partial derivatives of f :
« 
fx1 x1 (x) fx1 x2 (x) ··· fx1 xn (x)
¬ ·
¬ f (x) fx2 xn (x) ·
fx2 x2 (x) ···
H (f (x1 , . . . , xn )) = ¬ x2 x1 ·
¬ ·
. . .
. . .
 
. . .
fxn x1 (x) fxn x2 (x) ··· fxn xn (x)

If x (0) is a stationary point of f and H (f (x)) is p.d. at x (0) or s.p. in a neighbourhood
of x (0) , we have a minimum. In the opposite situation, if H (f (x)) is n.d. at x (0) or s.n.
in a neighbourhood of x (0) , we have a maximum.3 Extrema under constraint(s)
This is a similar concept, but one in which the analysis of the problem of extrema is
restricted to those x values that obey one or more constraints.

These notions are explained in Section in this Appendix.
Mathematical Concepts 331
(0) (0)
The point (x1 , . . . , xn ) is a local maximum (resp. minimum) of the function f under
the constraints
 g1 (x) = 0

gr (x) = 0

If x (0) veri¬es the constraints itself and
(0) (0)
(0) (0)
f (x1 , . . . , xn ) ≥ f (x1 , . . . , xn ) [resp. f (x1 , . . . , xn ) ¤ f (x1 , . . . , xn )]

for any (x1 , . . . , xn )

(0) (0)
in a neighbourhood of (x1 , · · · , xn )
satisfying the r constraints

Solving this problem involves considering the Lagrangian function of the problem. We
are looking at a function of the (n + r) variables (x1 , . . . , xn ; m1 , . . . , mr ), the latest r
values “ known as Lagrangian multipliers “ each correspond to a constraint:

L(x1 , . . . , xn ; m1 , . . . , mr ) = f (x) + m1 · g1 (x) + · · · + mr · gr (x)

We will not go into the technical details of solving this problem. We will, however, point
out an essential result: if the point (x (0) ; m(0) ) is such that x (0) veri¬es the constraints and
(x (0) ; m(0) ) is a extremum (without constraint) of the Lagrangian function, then x (0) is an
extremum for the problem of extrema under constraints.

1.2.2 Taylor™s formula
Taylor™s formula is also generalised for the n-variable functions, but the degree 1 term,
which reveals the ¬rst derivative, is replaced by n terms with the n partial derivatives:

(0) (0) (0)
fxi (x1 , x2 , . . . , xn ) i = 1, 2, . . . , n

In the same way, the degree 2 term, the coef¬cient of which constitutes the second
derivative, here becomes a set of n2 terms in which the various second partial derivatives
are involved:
(0) (0) (0)
fxi xj (x1 , x2 , . . . , xn ) i, j = 1, 2, . . . , n

Thus, by limiting the writing to the degree 2 terms, Taylor™s formula is written as follows:
(0) (0) (0) (0)
fxi (x (0) )hi
f (x1 + h1 , x2 + h2 , . . . , x n + hn ) ≈ f (x ) +
1! i=1
n n
fxi xj (x (0) )hi hj + · · ·
2! i=1 j =1
332 Asset and Risk Management

1.3.1 De¬nitions Matrices and vectors
The term n-order matrix is given to a set of n2 real numbers making up a square table
consisting of n rows and n columns.4 A matrix is generally represented by a capital letter
(such as A), and its elements by the corresponding lower-case letter (a) with two allocated
indices representing the row and column to which the element belongs: aij is the element
of matrix A located at the intersection of row i and column j within A. Matrix A can
therefore be written generally as follows:
« 
a11 a12 ··· a1j ··· a1n
¬ a21 a2n ·
a22 ··· a2j ···
¬ ·
¬. .·
. .
¬. . . .·
. . . .·
¬ ai1 · · · ain ·
ai2 ··· aij
¬ ·
¬. .·
. .
. . . .
. . . .
an1 an2 ··· anj · · · ann

In the same way, a vector of n dimension is a set of n real numbers forming a columnar
table. The elements in a vector are its components and are referred to by a single index.

¬ x2 ·
X=¬ ·
¬ xi ·
xn Speci¬c matrices
The diagonal elements in a matrix are the elements a11 , a22 , . . . , ann . They are located
on the diagonal of the table that starts from the upper left-hand corner; this is known as
the principal diagonal.
A matrix is de¬ned as symmetrical if the elements symmetrical with respect to the
principal diagonal are equal: aij = aji . Here is an example:

« 
2 ’3 0

A =  ’3 2

0 2 0

More generally, a matrix is a rectangular table with the format (m, n); m rows and n columns. We will, however, only

be looking at square matrices here.
Mathematical Concepts 333

An upper triangular matrix is a matrix in which the elements located underneath the
principal diagonal are zero: aij = 0 when i < j . For example:
« 
0 2 ’1
A = 0 3 0
00 5

The concept of a lower triangular matrix is of course de¬ned in a similar way.
Finally, a diagonal matrix is one that is both upper triangular and lower triangular. Its
only non-zero elements are the diagonal elements: aji = 0 when i and j are different.
Generally, this type of matrix will be represented by:
« 
a1 0 · · · 0
¬ 0 a2 · · · 0 ·
¬ ·
A=¬ . . · = diag (a1 , a2 , . . . , an )
. ..
. . . .
. . .
0 0 · · · an Operations
The sum of two matrices, as well as the multiplication of a matrix by a scalar, are
completely natural operations: the operation in question is carried out for each of the
elements. Thus:

(A + B)ij = aij + bij
(»A)ij = »aij

These de¬nitions are also valid for the vectors:

(X + Y )i = xi + yi
(»X)i = »xi

The product of two matrices A and B is a matrix of the same order as A and B, in which
the element (i, j ) is obtained by calculating the sum of the products of the elements in
line i of A with the corresponding elements in column j in B:
(AB)ij = ai1 b1j + ai2 b2j + · · · + ain bnj = aik bkj

We will have, for example:
« 
« «
0 ’1 5 ’2 ’2 10 ’3
2 0
1  ·  3 ’1 0  =  ’4 17 ’7 
 3 ’2
6 ’17
’3 0 ’1
2 6
2 0

Despite the apparently complex de¬nition, the matrix product has a number of classical
properties; it is associative and distributive with respect to addition. However, it needs to
be handled with care as it lacks one of the classical properties: it is not commutative. AB
does not equal BA!
334 Asset and Risk Management

The product of a matrix by a vector is de¬ned using the same “lines by columns”
(AX)i = aik xk

The transposition of a matrix A is the matrix At , obtained by permuting the symmetrical
elements with respect to the principal diagonal, or, which amounts to the same thing, by
permuting the role of the lines and columns in matrix A:
(At )ij = aji
A matrix is thus symmetrical if, and only if, it is equal to its transposition. In addition
this operation, applied to a vector, gives the corresponding line vector as its result.
The inverse of matrix A is matrix A’1 , if it exists, so that: AA’1 = A’1 A
= diag(1, . . . , 1) = I .
For example, it is easy to verify that:
« ’1 « 
1 ’1
10 1 3
 ’2 1 ’3  =  0 1
’2 ’1
01 0 1
Finally, let us de¬ne the trace of a matrix. The trace is the sum of the matrix™s diag-
onal elements: n
tr(A) = a11 + a22 + · · · + ann = aii

1.3.2 Quadratic forms Quadratic form and class of symmetrical matrix
A quadratic form is a polynomial function with n variables containing only second-
degree terms:
n n
Q(x1 , x2 , . . . , xn ) = aij xi xj
i=1 j =1

If we construct a matrix A from coef¬cients aij (i, j = 1, . . . , n) and the vector X of the
variables xi (i = 1, . . . , n), we can give a matrix expression to the quadratic form:
Q(X) = Xt AX.
In fact, by developing the straight-line member, we produce:
X AX = xi (AX)i
n n
= xi aij xj
i=1 j =1
n n
= aij xi xj
i=1 j =1
Mathematical Concepts 335

A quadratic form can always be associated with a matrix A, and vice versa. The matrix,
however, is not unique. In fact, the quadratic form Q(x1 , x2 ) = 3x1 ’ 4x1 x2 can be associ-

3 ’2 3 ’6
ated with matrices A = B= C= , as well
’2 ’4 0
0 2 0
as in¬nite number of others. Amongst all these matrices, only one is symmetrical (A in
the example given). There is therefore bijection between all the quadratic forms and all
the symmetrical matrices.
The class of a symmetrical matrix is de¬ned on the basis of the sign of the associated
quadratic form. Thus, the non-zero matrix A is said to be positive de¬nite (p.d.) if Xt AX >
0 for any X not equal to 0, and semi-positive (s.p.) when:

Xt AX ≥ 0 for any X = 0
there is one Y = 0 so that Y t AY = 0

A matrix is negative de¬nite (n.d.) and semi-negative (s.n.) by the inverse inequalities,
and the term non-de¬nite is given to a symmetrical matrix for which there are some X
and Y = 0 so that Xt AX > 0 and Y t AY < 0.
« 
’3 ’4
The symmetrical matrix A =  ’3 2  is thus p.d., as the associated quadratic
’4 2 8
form can be written as:

Q(x, y, z) = 5x 2 + 10y 2 + 8z2 ’ 6xy ’ 8xz + 4yz
= (x ’ 3y)2 + (2x ’ 2z)2 + (y + 2z)2

This form will never be negative, and simply cancels out when:

x ’ 3y = 0
2x ’ 2z = 0
y + 2z = 0

That is, when x = y = z = 0. Linear equation system
A system of n linear equations with n unknowns is a set of relations of the following type:
 a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2

a x + a x + ··· + a x = b
n1 1 n2 2 nn n n

In it, the aij , xj and bi are respectively the coef¬cients, the unknowns and the second
members. They are written naturally in both matrix and vectorial form: A, X and B.
Using this notation, the system is written in an equivalent but more condensed way:

AX = B
336 Asset and Risk Management

For example, the system of equations

2x + 3y = 4
4x ’ y = ’2

can also be written as:
2 3 4
4 ’1 y ’2

If the inverse of matrix A exists, it can easily be seen that the system admits one and just
one solution, given as X = A’1 X. Case of variance“covariance matrix5
«2 
σ1 σ12 · · · σ1n
¬ σ21 σ2 · · · σ2n ·
¬ ·
The matrix V = ¬ . . ·, for the variances and covariances of a number
. ..
. . .
. . .
σn1 σn2 · · · σn 2

of random variables X1 , X2 , . . . , Xn is a matrix that is either p.d. or s.p.
In effect, regardless of what the numbers »1 , »2 , . . . , »n are, not all zero and making
up the vector , we have:

n n n
V = »i »j σij = var »i Xi ≥0
i=1 j =1 i=1

It can even be said, according to this result, that the variance“covariance matrix V is
p.d. except when there are coef¬cients »1 , »2 , . . . , »n that are not all zero, so that the
random variable »1 X1 + · · · + »n Xn = n »i Xi is degenerate, in which case V will be
s.p. This degeneration may occur, for example, when:

• one of the variables is degenerate;
• some variables are perfectly correlated;
• the matrix V is obtained on the basis of observations of a number strictly lower than
the number of variables.

It will then be evident that the variance“covariance matrix can be expressed as a matrix,
through the relation:
V = E[(X ’ µ)(X ’ µ)t ] Choleski factorisation
Consider a symmetrical matrix A positive de¬nite. It can be demonstrated that there exists
a lower triangular matrix L with strictly positive diagonal elements so that A = LLt .

The concepts necessary for an understanding of this example are shown in Appendix 2.
Mathematical Concepts 337

This factorisation process is known as a Choleski factorisation. We will not be demon-
strating this property, but will show, using the previous example, how the matrix L
is found:
« «  «2 
a ab ad
a00 abd
LLt =  b c 0   0 c f  =  ab b2 + c2 bd + cf 
dfg 00g ad bd + cf d + f 2 + g 2
« 
’3 ’4
= A =  ’3 2
’4 2 8

It is then suf¬cient to work the last equality in order to ¬nd a, b, c, d, f and g in
succession, which will give the following for matrix L.
«√ 
5 0 0
√ √
¬ ·
¬35 ·
¬ 0·
L = ¬’ 5 ·
¬ √·
 4√5 √
14 41 
2 205
’ ’
5 205 41
Appendix 2
Probabilistic Concepts1

2.1.1 Random variables and probability law De¬nitions
Let us consider a fortuitous phenomenon, that is, a phenomenon that under given initial
conditions corresponds to several possible outcomes. A numerical magnitude that depends
on the observed result is known as a random variable or r.v.
In addition, probabilities are associated with various possible results or events de¬ned
in the context of the fortuitous phenomenon. It is therefore interesting to ¬nd out the
probabilities of the various events de¬ned on the basis of the r.v. What we are looking at
here is the concept of law of probability of the r.v. Thus, if the r.v. is termed X, the law
of probability of X is de¬ned by the range of the following probabilities: Pr[X ∈ A], for
every subset A of R.
The aim of the concept of probability law is a bold one: the subsets A of R are
too numerous for all the probabilities to be known. For this reason, we are content to
work with just the ]’∞; t] sets. This therefore de¬nes a function of the variable t,
the cumulative distribution function or simplier distribution function (d.f.) of the random
variable F (t) = Pr[X ¤ t].
It can be demonstrated that this function, de¬ned in R, is increasing, that it is between 0
and 1, that it admits the ordinates 0 and 1 as horizontal asymptotics limt’±∞ F (t) = ,
and that it is right-continuous: lims’t+ F (s) = F (t).
These properties are summarised in Figure A2.1.
In addition, despite its simplicity, the d.f. allows almost the whole of the probability
law for X to be found, thus:

Pr[s < X ¤ t] = F (t) ’ F (s)
Pr[X = t] = F (t) ’ F (t’) Quantile
Sometimes there is a need to solve the opposite problem: being aware of a probability
level u and determining the value of t so that F (t) = Pr[X ¤ t] = u.
This value is known as the quantile of the r.v. X at point u and its de¬nition are shown
in Figure A2.2.

Readers wishing to ¬nd out more about these concepts should read: Baxter M. and Rennie A., Financial Calculus,
Cambridge University Press, 1996. Feller W., An Introduction to Probability Theory and its Applications (2 volumes), John
Wiley and Sons, Inc., 1968. Grimmett G. and Stirzaker D., Probability and Random Processes, Oxford University Press,
1992. Roger P., Les outils de la mod´ lisation ¬nanci` re, Presses Universitaires de France, 1991. Ross S. M., Initiation aux
e e
probabilit´ s, Press Polytechniques et Universitaires Romandes, 1994.
340 Asset and Risk Management


Figure A2.1 Distribution function



0 Q(u)

Figure A2.2 Quantile



0 Q(u) t

Figure A2.3 Quantile in jump scenario

In two cases, however, the de¬nition that we have just given is unsuitable and needs to
be adapted. First of all, if the d.f. of X shows a jump that covers the ordinate u, none of
the abscissas will correspond to it and the abscissa of the jump, naturally, will be chosen
(see Figure A2.3).
Next, if the ordinate u corresponds to a plateau on the d.f. graph, there is an in¬nite
number of abscissas on the abscissa to choose from (see Figure A2.4).
In this case, an abscissa de¬ned by the relation Q(u) = um + (1 ’ u)M can be chosen.
The quantile function thus de¬ned generalises the concept of the reciprocal function of
the d.f. Discrete random variable
A discrete random variable corresponds to a situation in which the set of possible values
for the variable is ¬nite or in¬nite countable. In this case, if the various possible values
Probabilistic Concepts 341



m Q(u) M

Figure A2.4 Quantile in plateau scenario

and corresponding probabilities are known

x1 x2 ··· xn ···
p1 p2 ··· pn ···
Pr[X = xi ] = pi i = 1, 2, . . . , n, . . .
pi = 1

The law of probability of X can be easily determined:

Pr[X ∈ A] = pi
{i:xi ∈A}

The d.f. of a discrete r.v. is a stepped function, as the abscissas of jumps correspond to
the various possible values of X and the heights of the jumps are equal to the associated
probabilities (see Figure A2.5).
In particular, a r.v. is de¬ned as degenerate if it can only take on one value x (also
referred to as a certain variable): Pr[X = x] = 1.
The d.f. for a degenerate variable will be 0 to the left of x and 1 from x onwards. Continuous random variable
In contrast to the discrete r.v., the set of possible values for a r.v. could be continuous
(an interval, for example) with no individual value having a strictly positive probability:

Pr[X = x] = 0 ∀x




Figure A2.5 Distribution function for a discrete random variable
342 Asset and Risk Management

x x+h

Figure A2.6 Probability density

In this case, the distribution of probabilities over the set of possible values is expressed
using a density function f : for a suf¬ciently small h, we will have Pr[x < X ¤ x + h] ≈
hf (x).
This de¬nition is shown in Figure A2.6.
The law of probability is obtained from the density through the following relation:

Pr[X ∈ A] = f (x) dx

And as a particular case:
F (t) = f (x) dx
’∞ Multivariate random variables
Often there is a need to consider several r.v.s simultaneously X1 , X2 , . . . , Xm , associated
with the same fortuitous phenomenon.2 Here, we will simply show the theory for a bivari-
ate random variable, that is, a pair of r.v.s (X, Y ); the general process for a multivariate
random variable can easily be deduced from this.
The law of probability for a bivariate random variable is de¬ned as the set of the
following probabilities: Pr [(X, Y ) ∈ A], for every subset A of R2 . The joint distribu-
tion function is de¬ned F (s, t) = Pr([X ¤ s] © [Y ¤ t]) and the discrete and continuous
bivariate random variables are de¬ned respectively by:
pij = Pr([X = xi ] © [Y = yj ])

Pr[(X, Y ) ∈ A] = f (x, y) dx dy

Two r.v.s are de¬ned as independent when they are not in¬‚uenced either from the point
of view of possible values or through the probability of the events that they de¬ne. More
formally, X and Y are independent when:

Pr([X ∈ A] © [Y ∈ B]) = Pr[X ∈ A] · Pr[Y ∈ B]

for every A and B in R.

For example, the return on various ¬nancial assets.
Probabilistic Concepts 343

It can be shown that two r.v.s are independent if, and only if, their joint d.f. is equal to the
product of the d.f.s of each of the r.v.s: F (s, t) = FX (s) · FY (t). And that this condition,
for discrete or continuous random variables, shows as:
pij = Pr[X = xi ] · Pr[Y = yj ]
f (x, y) = fX (x) · fY (y)

2.1.2 Typical values of random variables
The aim of the typical values of a r.v. is to summarise the information contained in
its probability law in a number of representative parameters: parameters of location,
dispersion, skewness and kurtosis. We will be looking at one from each group. Mean
The mean is a central value that locates a r.v. by dividing the d.f. into two parts with the
same area (see Figure A2.7).
The mean µ of the r.v. X is therefore such that:
µ +∞
F (t) dt = [1 ’ F (t)] dt
’∞ µ

The mean of a r.v. can be calculated on the basis of the d.f.:
+∞ 0
µ= [1 ’ F (t)] dt ’ F (t) dt

the formula reducing for a positive r.v. as follows:
µ= [1 ’ F (t)] dt

It is possible to demonstrate that for a discrete r.v. and a continuous r.v., we have
the formulae:

µ= xi pi
µ= xf (x) dx

Figure A2.7 Mean of a random variable
344 Asset and Risk Management

The structure of these two formulae shows that µ integrates the various possible values
for the r.v. X by weighting them through the probabilities associated with these values.
It can be shown3 that these formulae generalised into an abstract integral of X(ω) with
of the possible outcomes ω of the
respect to the measure of probability Pr in the set
fortuitous phenomenon. This integral is known as the expectation of the r.v. X:

E(X) = X(ω)d Pr(ω)

According to the foregoing, there is equivalence between the concepts of expectation
and mean (E(X) = µ) and we will interchange both these terms from now on.
The properties of the integral show that the expectation is a linear operator:

E(aX + bY + c) = aE(X) + bE(Y ) + c

And that if X and Y are independent, them E(XY ) = E(X) · E(Y ).
In addition, for a discrete r.v. or a continuous r.v., the expectation of a function of a
r.v. variable is given by:

E(g(X)) = g(xi )pi
E(g(X)) = g(x)f (x) dx

Let us remember ¬nally the law of large numbers,4 which for a sequence of independent
r.v.s X1 , X2 , . . . , Xn with identical distribution and a mean µ, expresses that regardless
of what µ > 0 may be

X1 + X2 + · · · + Xn
’µ ¤µ =1
lim Pr

This law justi¬es taking the average of a sample to estimate the mean of the pop-
ulation and in particular estimating the probability of an event through the frequency
of that event™s occurrence when a large number of realisations of the fortuitous phe-
nomenon occur. Variance and standard deviation
One of the most commonly used dispersion indices (that is, a measurement of the spread
of the r.v.s values around its mean) is the variance σ 2 , de¬ned as:

σ 2 = var(X) = E[(X ’ µ)2 ]

This development is part of measure theory, which is outside the scope of this work. Readers are referred to Loeve M.,
Probability Theory (2 volumes), Springer-Verlag, 1977.
We are showing this law in its weak form here.
Probabilistic Concepts 345



Figure A2.8 Variance of a random variable

By developing the right member, we can therefore arrive at the variance

σ 2 = E(X2 ) ’ µ2

For a discrete r.v. and a continuous r.v., this will give:

σ2 = (xi ’ µ)2 pi = xi 2 pi ’ µ2
i i
+∞ +∞
σ= (x ’ µ) f (x) dx = x 2 f (x) dx ’ µ2
2 2
’∞ ’∞

An example of the interpretation of this parameter is found in Figure A2.8.
It can be demonstrated that var(aX + b) = a 2 var(X). And that if X and Y are inde-
pendent, then var(X + Y ) = var(X) + var(Y ).
Alongside the variance, the dimension of which is the square of the dimension of X,
we can also use the standard deviation, which is simply the square root:

σ= var(X) Fisher™s skewness and kurtosis coef¬cients
Fisher™s skewness coef¬cient is de¬ned by:

E[(X ’ µ)3 ]
γ1 =
It is interpreted essentially on the basis of its sign: if γ1 > 0 (resp. <0), the distribution
of X will be concentrated to the left (resp. the right) and spread out to the right (resp. the
left). For a symmetrical distribution, γ1 = 0. This interpretation is shown in Figure A2.9.
Fisher™s kurtosis coef¬cient is given by:

E[(X ’ µ)4 ]
γ2 = ’3
It is interpreted by comparison with the normal distribution (see Section A.2.2.1). This
distribution has a kurtosis coef¬cient of 0. Distributions with higher kurtosis than the
normal law (also termed leptokurtic) are more pointed in the neighbourhood of their
346 Asset and Risk Management

f(x) γ1 = 3.5
γ1 = 0
γ1 = “3.5

0 x

Figure A2.9 Skewness coef¬cient of a random variable

f(x) γ2 = 3
γ2 = “0.6


Figure A2.10 Kurtosis coef¬cient of a random variable

mean and present fatter tails (and are therefore less important for intermediate values)
than normal distribution; they are characterised by a positive γ2 parameter. Of course, the
distributions with lower kurtosis have a negative kurtosis coef¬cient (see Figure A2.10).
For discrete or continuous r.v.s, the formulae that allows E(g(X)) to be calculated are
used as usual. Covariance and correlation
We now come to the parameters relative to the bivariate random variables. Covariance
between two r.v.s, X and Y , is de¬ned by: σXY = cov(X, Y ) = E[(X ’ µX )(Y ’ µY )]
and can also be calculated by cov (X, Y ) = E(XY ) ’ µX µY .
For discrete r.v.s and continuous r.v.s, the covariance is calculated by:

cov(X, Y ) = (xi ’ µX )(yj ’ µY )pij
i j

= xi yj pij ’ µX µY
i j
+∞ +∞
cov(X, Y ) = (x ’ µX )(y ’ µY )f (x, y) dx dy
’∞ ’∞
+∞ +∞
= xyf (x, y) dx dy ’ µX µY
’∞ ’∞
Probabilistic Concepts 347

The covariance is interpreted as follows: we are looking at the degree of linear connection
that exists between the two r.v.s. A positive covariance thus corresponds to values of the
product (X ’ µX ) (Y ’ µY ) that will be mostly positive and the two factors will be mostly
of the same sign. High values for X (greater than µX ) will correspond to high values for
Y (greater than µY ) and low values for X will correspond to low values for Y . The same
type of reasoning also applies to negative covariance.
It can be demonstrated that:

cov(aX + bY + c, Z) = a cov(X, Z) + b cov(Y, Z)
cov(X, X) = var(X)
E(XY ) = E(X) · E(Y ) + cov(X, Y )
var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y )

and that if X and Y are independent, their covariance is zero. In this case, in fact:

cov(X, Y ) = E[(X ’ µX )(Y ’ µY )]
= E(X ’ µX )E(Y ’ µY )
= (E(X) ’ µX )(E(Y ) ’ µY )

Another parameter, which measures the degree of linear connection between the two
r.v.s is the correlation coef¬cient:
ρXY = corr(X, Y ) =
σX · σY

The interest in the correlation coef¬cient in comparison to covariance is that we are
looking at a number without dimension, while the covariance measurement unit is equal
to the product of the units of the two r.v.s. Also, the correlation coef¬cient can only assume
values between ’1 and 1 and these two extreme values correspond to the existence of a
perfect linear relation (increasing or decreasing depending on whether ρ = 1 or ρ = ’1)
between the two r.v.s.
Two r.v.s whose correlation coef¬cient (or covariance) is zero are termed non-correlated.
It has been said earlier that independent r.v.s are noncorrelated, but the inverse is not true!
The independence of two r.v.s in fact excludes the existence of any relation between the
variables, while noncorrelation simply excludes the existence of a linear relation.

2.2.1 Normal distribution and associated ones Normal distribution
Remember that a normal random variable with parameters (µ; σ ) is de¬ned by its density:
1 x’µ 2
f (x) = √ exp ’ , which is shown graphically in Figure A2.11.
348 Asset and Risk Management

0 µ x

Figure A2.11 Normal density

The normal density graph is symmetrical with respect to the vertical straight line of
abscissa µ and shows two points of in¬‚exion, at (µ ’ σ ) and (µ + σ ).
The typical values for this distribution are given by:

E(X) = µ
var (X) = σ 2
γ1 (X) = 0
γ2 (X) = 0

If the r.v. X is distributed following a normal law with parameters (µ; σ ), it can be
demonstrated that the r.v. (aX + b) is also distributed according to a normal law. In
particular, the r.v. follows a normal law with parameters (0; 1). This is known as
a standard normal law.
The preceding result can be generalised: if the r.v.s X1 , X2 , . . . , Xn are independent
and normally distributed with E(Xk ) = µk , var(Xk ) = σk2 , k = 1, . . . , m, then the r.v.
k=1 ak Xk + b will follow a normal law with parameters
« 
m m
 ak σk2 
ak µk + b, 2

k=1 k=1 Central limit theorem
The importance of this normal law in probability theory and statistics stems from the
well-known central limit theorem, which states that if the r.v.s X1 , X2 , . . . , Xn . . . .:

• are independent;
• have ¬nite mean µk and standard deviation σk (k = 1, . . . , n, . . .);
• and do not have any weighting variance with respect to the whole set limn’∞
= 0 ∀k,
σ1 + · · · +σn
2 2
(X1 +· · ·+Xn )’(µ1 +· · ·+µn )
then the distribution of the r.v., , tends towards a standard
σ1 +· · ·+σn
2 2

normal law when n tends towards in¬nity.
Probabilistic Concepts 349

Much more intuitively, the central limit theorem states that the sum of a large number
of independent effects, none of which has a signi¬cant variability with respect to the set,
is distributed according to the normal law without any hypothesis on the distribution of
the various terms in the sum. Multi-normal distribution
An m-variate random variable (X1 , X2 , . . . , Xm ), is said to be distributed according
to a multi-normal law with parameters (µ; V ) if it allows multi-variate density given
1 1
exp ’ (x ’ µ)t V ’1 (x ’ µ) , in which µ and V
by f (x1 , . . . , xm ) = √
(2π)m dtm(V ) 2
represent respectively the vector of means and the variance“covariance matrix of the r.v.s
Xk (k = 1, . . . , m).
The property of the linear combination of normal independent r.v.s can be generalised
as follows: for a multi-normal random variable X with parameters (µ; V ), and a matrix
A that allows an inverse, the m-variate random variable AX + b is itself distributed
according to a multi-normal parameter law (Aµ + b; AVAt ).
For the speci¬c case m = 2, the multi-normal density is termed binormal and written
’1 x1 ’ µ1
f (x1 , x2 ) = exp
2(1 ’ ρ 2 ) σ1
2πσ1 σ2 1 ’ ρ2
x1 ’ µ1 x2 ’ µ2 x2 ’ µ2
’2ρ +
σ1 σ2 σ2 Log-normal distribution
Let us now return to a one-dimensional distribution linked to the normal law. A r.v. X is
said to be distributed according to a log-normal law with parameter (µ; σ ) when lnX is
normally distributed with the parameters (µ; σ ). It can be easily demonstrated that this
r.v. will only take positive values and that it is de¬ned by the density
ln x ’ µ
1 1
f (x) = √ exp ’ (x > 0)
2πσ x 2

The graph for this density is shown in Figure A2.12 and its typical values are given by:
E(X) = e 2

var(X) = e2µ+σ (eσ ’ 1)
2 2

γ1 (X) = (eσ + 2) eσ 2 ’ 1

γ2 (X) = (e3σ + 3e2σ + 6eσ + 6)(eσ ’ 1)
2 2 2 2

This con¬rms the skewness with concentration to the left and the spreading to the right,
observed on the graph.
350 Asset and Risk Management



Figure A2.12 Log-normal distribution

We would point out ¬nally that a result of the same type as the central limit theorem
also leads to the log-normal law: this is the case in which the effects represented by the
various r.v.s accumulate through a multiplication model rather than through an addition
model, because of the fundamental property of the logarithms: ln(x1 · x2 ) = ln x1 + ln x2 .

2.2.2 Other theoretical distributions Poisson distribution
The Poisson r.v., with parameter µ, is a discrete X r.v. that takes all the complete positive
integer values 0, 1, 2 etc. with the associated probabilities of:

’µ µ
Pr[X = k] = e k∈N

The typical values for this distribution are given by:

E(X) = µ
var(X) = µ Binomial distribution
The Bernoulli scheme is a probability model applied to a very wide range of situations.
It is characterised by

• a ¬nite number of independent trials;
• during each trial, two results only “ success and failure “ are possible;
• also during each trial, the probability of a success occurring is the same.

If n is the number of trials and p the probability of each success succeeding, the term
used is Bernoulli scheme with parameters (n; p) and the number of successes out of the
Probabilistic Concepts 351

n tests is a binomial parameter r.v., termed B(n, p). This discrete random variable takes
the values 0, 1, 2, . . . , n with the following associated probabilities:5
p k (1 ’ p)n’k
Pr[B(n; p) = k] = k ∈ {0, 1, . . . , n}
The sum of these probabilities equals 1, in accordance with Newton™s binomial formula.
In addition, the typical values for this distribution are given by:
E(B(n; p)) = np
var(B(n; p)) = np(1 ’ p)
The binomial distribution allows two interesting approximations when the n parameter is
large. Thus, for a very small p, we have the approximation through Poisson™s law with
np parameter:
’np (np)
Pr[B(n; p) = k] ≈ e
For a p that is not to close to 0 or 1, the binomial r.v. tends towards a normal law with

parameters (np; np(1 ’ p)), and more speci¬cally:

k’µ+ k’µ’
1 1
Pr[B(n; p) = k] ≈ ’
2 2
σ σ Student distribution
The Student distribution, with n degrees of freedom, is de¬ned by the density
( ν+1 ) x2
f (x) = 1+
( 2 ) νπ
In this expression, the gamma function is de¬ned by (n) = 0 e’x x n’1 dx.
This generalises the factorial function as (n) = (n ’ 1) · (n ’ 1) and for integer n,
we have: (n) = (n ’ 1)!
This is, however, de¬ned for n values that are not integer: all the positive real values
of n and, for example: √
(1) = π

We are not representing the graph for this density here, as it is symmetrical with respect
to the vertical axis and bears a strong resemblance to the standard normal density graph,
although for ν > 4 the kurtosis coef¬cient value is strictly positive:
E(X) = 0
var(X) =
ν ’2
γ1 (X) = 0
γ2 (X) =
ν ’4
Remember that
k p!(n ’ p)!
352 Asset and Risk Management

Finally, it can be stated that when the number of degrees of freedom tends towards in¬nity,
the Student distribution tends towards the standard normal distribution, this asymptotic
property being veri¬ed in practice as soon as ν reaches the value of 30. Uniform distribution
A r.v. is said to be uniform in the interval [a; b] when the probability of its taking a
value between t and t + h6 depends only on these two boundaries through h. It is easy
to establish, on that basis, that we are looking at a r.v. that only takes a value within the
interval [a; b] and that its density is necessarily constant:

f (x) = 1/(b ’ a) (a < x < b)

Its graph is shown in Figure A2.13.
The principal typical values for the uniform r.v. are given by:
E(X) =
(a ’ b)2
var(X) =
γ1 (X) = 0
γ2 (X) = ’
This uniform distribution is the origin of some simulation methods, in which the generation
of random numbers distributed uniformly in the interval [0; 1] allows distributed random
numbers to be obtained according to a given law of probability (Figure A2.14). The way
in which this transformation occurs is explained in Section 7.3.1. Let us examine here
how the (pseudo-) random numbers uniformly distributed in [0; 1] can be obtained.
The sequence x1 , x2 , . . . , xn is constructed according to residue classes. On the basis
of an initial value of ρ0 (equal to 1, for example), we can construct for i = 1, 2, . . . ,
n etc.:
xi = decimal part of (c1 ρi’1 )
ρi = c2 xi
Here, the constants c1 and c2 are suitably chosen, Thus, for c1 = 13.3669 and c2 =
94.3795, we ¬nd successively as shown in Table A2.1:


1/(b “ a)

a b x

Figure A2.13 Uniform distribution

These two values are assumed to belong to the interval [a; b].
Probabilistic Concepts 353



Figure A2.14 Random numbers uniformly distributed in [0; 1]

xI and ρI
Table A2.1

i xi ρi

1 0.366900 34.627839
2 0.866885 81.813352
3 0.580898 55.768652
4 0.453995 42.847849
5 0.742910 70.115509
6 0.226992 21.423384
7 0.364227 34.375527
8 0.494233 46.645452
9 0.505097 47.670759
10 0.210265 19.844676 Generalised error distribution
The generalised distribution of errors for parameter ν is de¬ned by the density
®« ν 

¬ ·

f (x) = exp °’   ».
ν 1

The graph for this density is shown in Figure A2.15.
This is a distribution symmetrical with respect to 0, which corresponds to a normal
distribution for n = 2 and gives rise to a leptokurtic distribution (resp. negative kurtosis
distribution) for n < 2 (n > 2).

2.3.1 General considerations
The term stochastic process is applied to a random variable that is a function of the time
variable: {Xt : t ∈ T }.
354 Asset and Risk Management



Figure A2.15 Generalised error distribution

If the set T of times is discrete, the stochastic process is simply a sequence of random
variables. However, in a number of ¬nancial applications such as Black and Scholes™
model, it will be necessary to consider stochastic processes in continuous time.
For each possible result ω ∈ , the function of Xt (ω) of the variable t is known as the
path of the stochastic process.
A stochastic process is said to have independent increments when, regardless of the
times t1 < t2 < . . . < tn , the r.v.s

Xt1 , Xt2 ’ Xt1 , Xt3 ’ Xt2 , . . .

are independent. In the same way, a stochastic process is said to have stationary increments
when for every t and h the r.v.s Xt+h ’ Xt and Xh are identically distributed.

2.3.2 Particular stochastic processes The Poisson process
We consider a process of random occurrences of an event in time, corresponding to the
set [0; +∞[. Here, the principal interest does not correspond directly to the occurrence
times, but to the number of occurrences within given intervals. The r.v. that represents
the number of occurrences within the interval [t1 , t2 ] is termed n(t1 , t2 ).
This process is called a Poisson process if it obeys the following hypotheses:

• the numbers of occurrences in separate intervals of time are independent;
• the distribution of the number of occurrences within an interval of time only depends
on that interval through its duration: Pr[n(t1 , t2 ) = k] is a function of (t2 ’ t1 ), which
is henceforth termed pk (t2 ’ t1 );
• there is no multiple occurrence: if h is low, Pr[n(0; h) ≥ 2] = o(h);
• there is a rate of occurrence ± so that Pr[n(0; h) = 1] = ±h + o(h).

It can be demonstrated that under these hypotheses, the r.v. ˜number of occurrences
within an interval of duration t™ is distributed according to a Poisson law for parameter ±t:

pk (t) = e k = 0, 1, 2, . . .
Probabilistic Concepts 355

To simplify, we note Xt = n(0; t). This is a stochastic process that counts the number
of occurrences over time. The path for such a process is therefore a stepped function,
with the abscissas for the jumps corresponding to the occurrence times and the heights
of the jumps being equal to 1. It can be demonstrated that the process has independent
and stationary increments and that E(Xt ) = var(Xt ) = ±t.
This process can be generalised as follows. We consider:

• A Poisson process Xt as de¬ned above; with the time of the k th occurrence expressed
as Tk , we have: Xt = #{k : Tk ¤ t}.
• A sequence Y1 , Y2 , . . . of independent and identically distributed r.v.s, independent of
the Poisson process.

The process Zt = {k:Tk ¤t} Yk is known as a compound Poisson process.
The paths of such a process are therefore stepped functions, with the abscissas for the
jumps corresponding to the occurrence times for the subjacent Poisson process and the
heights of the jumps being the realised values of the r.v.s Yk . In addition, we have:

E(Zt ) = ±t · µY
var(Zt ) = ±t · (σ 2 Y + µ2 Y ) Standard Brownian motion
Consider a sequence of r.v.s Xk , independent and identically distributed, with values
(’ X) and X with respective probabilities 1/2 and 1/2, and de¬ne the sequence of
r.v.s as Yn through Yn = X1 + X2 + · · · + Xn . This is known as a symmetrical random
walk. As E(Xk ) = 0 var(Xk ) = ( X)2 , we have E(Yn ) = 0 var(Yn ) = n( X)2 .
For our modelling requirements, we separate the interval of time [0; t] in n subintervals
of the same duration t = t/n and de¬ne Zt = Zt(n) = Yn . We have:

( X)2
E(Zt ) = 0 var(Yn ) = n( X) = t.
This variable Zt allows the discrete development of a magnitude to be modelled. If
we then wish to move to continuous modelling while retaining the same variability per
( X)2
= 1, for example, we obtain the stochastic process
unit of time, that is, with:
wt = limn’∞ Zt(n) .
This is a standard Brownian motion (also known as a Wiener process). It is clear that
this stochastic process wt , de¬ned on R+ , is such that w0 = 0, that wt has independent
and stationary increments, and that in view of the central limit theorem wt is distributed

according to a normal law with parameters (0; t). It can be shown that the paths of a
Wiener process are continuous everywhere, but cannot generally be differentiated. In fact

wt µ t µ
= =√
t t t

where, µ is a standard normal r.v.
356 Asset and Risk Management Ito process
If a more developed model is required, wt can be multiplied by a constant in order to
produce variability per time unit ( X)2 / t different from 1 or to add a constant to it in
order to obtain a non-zero mean:

Xt = X0 + b · wt

This type of model is not greatly effective because of the great variability of the devel-

opment in the short term, the standard deviation of Xt being equal7 to b t.
For this reason, this type of construction is applied more to variations relating to a
short interval of time:
dXt = a · dt + b · dwt

It is possible to generalise by replacing the constants a and b by functions of t and Xt :

dXt = at (Xt ) · dt + bt (Xt ) · dwt

This type of process is known as the Itˆ process. In ¬nancial modelling, several speci¬c
cases of Itˆ process are used, and a geometric Brownian motion is therefore obtained when:

at (Xt ) = a · Xt bt (Xt ) = b · Xt

An Ornstein“Uhlenbeck process corresponds to:

at (Xt ) = a · (c ’ Xt ) bt (Xt ) = b

and the square root process is such that:

at (Xt ) = a · (c ’ Xt ) bt (Xt ) = b Xt

2.3.3 Stochastic differential equations
Expressions of the type dXt = at (Xt ) · dt + bt (Xt ) · dwt cannot simply be handled in the
same way as the corresponding determinist expressions, because wt cannot be derived.
It is, however, possible to extend the de¬nition to a concept of stochastic differential,
through the theory of stochastic integral calculus.8
As the stochastic process zt is de¬ned within the interval [a; b], the stochastic integral
of zt is de¬ned within [a; b] with respect to the standard Brownian motion wt by:

zt dwt = lim ztk (wtk+1 ’ wtk )
a δ’0 k=0

The root function presents a vertical tangent at the origin.


. 14
( 16)