ńņš. 2 |

cov(wg R, [ 1 Ī· ]AM 0 Ī£ā’1 R)= [ 1 Ī· ]AM 0 Ī£ā’1 Ī£wg

ā” ā¤

0

1

ā¦ Ī£ā’1 1 1

= [ 1 Ī· ]A ā£

10 Ī£ā’1 1

Āµ0

ā” ā¤

0 ā’1

1Ī£ 1 1

= [ 1 Ī· ]A ā£ ā¦

0 Ī£ā’1 1

Āµ0 Ī£ā’1 1 1

ā” ā¤

1 1

= [ 1 Ī· ]ā£ ā¦

10 Ī£ā’1 1

0

1

=

10 Ī£ā’1 1

where we use the fact that, by the deļ¬nition of A,

ā” ā¤ā” ā¤

0 ā’1 0 ā’1

1Ī£ 1 ĀµĪ£ 1 10

Aā£ ā¦=ā£ ā¦.

Āµ0 Ī£ā’1 1 Āµ0 Ī£ā’1 Āµ 01

Now consider two portfolios on the boundary in Figure 2.2. For each the

weights are of the same form, say

ā” ā¤ ā” ā¤

1 1

wp = Ī£ā’1 M A ā£ ā¦ and wq = Ī£ā’1 M A ā£ ā¦ (2.16)

Ī·p Ī·q

where the mean returns are Ī·p and Ī·q respectively. Consider the covariance

between these two portfolios

0 0 0

cov(wp R, wq R) = wp Ī£wq

ā” ā¤

1

](M 0 Ī£ā’1 M )ā’1 ā£ ā¦

=[ 1 Ī·p

Ī·q

= A11 + A12 (Ī·p + Ī·q ) + A22 Ī·p Ī·q

ā” ā¤

0

= var(wp R) ā’ [ 1 Ī·p ]A ā£ ā¦

0

Ī·p ā’ Ī·q

46 CHAPTER 2. SOME BASIC THEORY OF FINANCE

An interesting special portfolio that is a āzero-betaā portfolio, one that is

0

perfectly uncorrelated with the portfolio with weights wp R. This is obtained by

setting the above covariance equal to 0 and solving we obtain

A11 + A12 Ī·p

Ī·q = ā’

A12 + A22 Ī·p

Āµ0 Ī£ā’1 Āµ ā’ (Āµ0 Ī£ā’1 1)Ī·p

= 0 ā’1 .

Āµ Ī£ 1 ā’ (10 Ī£ā’1 1)Ī·p

There is a simple method for determining the point (, Ī·q ) graphically indicated

in Figure ??. From the equation relating points on the boundary,

Ļ2 ā’ A22 (Ī· ā’ Ī·g )2 = Ļg

2

we obtain

ā‚Ī· Ļ

=

A22 (Ī· ā’ Ī·g )

ā‚Ļ

and so the tangent line at the point (Ļp , Ī·p ) strikes the Ļ = 0 axis at a point

Ī·q which satisļ¬es

Ī·p ā’ Ī·q Ļp

=

A22 (Ī·p ā’ Ī·g )

Ļp

or

2

Ļp

Ī·q = Ī·p ā’

A22 (Ī·p ā’ Ī·g )

2

A22 Ī·p + 2A12 Ī·p + A11

= Ī·p ā’

A22 Ī·p + A12

A11 + A12 Ī·p

=ā’ (2.17)

.

A12 + A22 Ī·p

Note that this is exactly the same mean return obtained earlier for the portfolio

0

which has zero covariance with wp R. This shows that we can ļ¬nd the standard

deviation and mean of this uncorrelated portfolio by constructing the tangent

line at the point (Ļp , Ī·p ) and then setting Ī·q to be the y-coordinate of the

point where this tangent line strikes the Ļ = 0 axis as in Figure 2.3.

[FIGURE 2.3 ABOUT HERE]

MINIMUM VARIANCE PORTFOLIOS AND THE CAPITAL ASSET PRICING MODEL.47

Figure 2.3: The tangent line at the point (Ļp , Ī·p )

Now suppose that there is available to all investors a risk-free investment.

Such an investment typically has smaller return than those on the eļ¬cient

frontier but since there is no risk associated with the investment, its standard

deviation is 0. It may be a government bond or treasury bill yielding interest

rate r so it corresponds to a point in Figure 2.4 at (0, r). Since all investors are

able to include this in their portfolio, the eļ¬cient frontier changes. In fact if

an investor invests an amount Ī² in this risk-free investment and amount 1 ā’ Ī²

(this may be negative) in the risky portfolio with standard deviation and mean

return (Ļp , Ī·p ) then the resulting investment has mean return

0

E(Ī²r + (1 ā’ Ī²)wp R) = Ī²r+(1 ā’ Ī²)Ī· p

and standard deviation of return

q

0

V ar(Ī²r + (1 ā’ Ī²)wp R) = (1 ā’ Ī²)Ļp .

This means that every point on a line joining (0, r) to points in the risky portfolio

are now attainable and so the new set of attainable values of (Ļ, Ī·) consists of a

cone with vertex at (0, r),the region shaded in Figure 2.4. The eļ¬cient frontier

48 CHAPTER 2. SOME BASIC THEORY OF FINANCE

Figure 2.4: _____

is now the line L in Figure 2.4. The point m is the point at which this line is

tangent to the eļ¬cient frontier determined from the risky investments. Under

this theory, this point has great signiļ¬cance.

[FIGURE 2.4 ABOUT HERE]

Lemma 6 The value-weighted market average corresponds to the point of tan-

gency m of the line to the risky portfolio eļ¬cient frontier.

From (2.17) the point m has standard deviation, mean return Ī·m which

solves

A11 + A12 Ī·m

r=ā’

A12 + A22 Ī·m

Āµ Ī£ Āµ ā’ (Āµ0 Ī£ā’1 1)Ī·m

0 ā’1

= 0 ā’1

Āµ Ī£ 1 ā’ (10 Ī£ā’1 1)Ī·m

and this gives

Āµ0 Ī£ā’1 Āµ ā’ r(Āµ0 Ī£ā’1 1)

Ī·m = .

Āµ0 Ī£ā’1 1 ā’ r(10 Ī£ā’1 1)

MINIMUM VARIANCE PORTFOLIOS AND THE CAPITAL ASSET PRICING MODEL.49

The corresponding weights on individual stocks are given by

ā” ā¤

1

wm = Ī£ā’1 M A ā£ ā¦.

Ī·m

ā” ā¤

A11 + A12 Ī·m

= Ī£ā’1 [1 Āµ] ā£ ā¦

A12 + A22 Ī·m

ā” ā¤

ā’r

= cĪ£ā’1 [1 Āµ] ā£ ā¦ , where c = A12 + A22 Ī·m

1

= cĪ£ā’1 (Āµā’r1).

These market weights depend essentially on two quantities. If R denotes the

correlation matrix

Ī£ij

Rij =

Ļi Ļj

ā

where Ļi = Ī£ii is the standard deviation of the returns from stock i, and

Āµi ā’ r

Ī»i =

Ļi

is the standardized excess return or the price of risk, then the weight wi on

stock i is such that

wi Ļi ā Rā’1 Ī» (2.18)

with Ī» the column vector of values of Ī»i . For the purpose of comparison, recall

that the conservative portfolio, one minimizing the variance over all portfolios

of risky stocks, has weights

wg ā Ī£ā’1 1

which means that the weight on stock i satisļ¬es a relation exactly like (2.18)

except that the mean returns Āµi have all been replaced by the same constant.

Let us suppose that stocks, weighed by their total capitalization in the mar-

ket result in some weight vector w 6= wm . When there is a risk-free investment,

m is the only point in the risky stock portfolio that lies in the eļ¬cient frontier

and so evidently if we are able to trade in a market index (a stock whose value

50 CHAPTER 2. SOME BASIC THEORY OF FINANCE

depends on the total market), we can ļ¬nd an investment which is a combination

of the risk-free investment with that corresponding to m which has the same

standard deviation as w0 R but higher expected return. By selling short the

market index and buying this new portfolio, an arbitrage is possible. In other

words, the market will not stay in this state for long.

If the market portfolio m has standard deviation Ļm and mean Ī·m , then

the line L is described by the relation

Ī·m ā’ r

Ī·=r+ Ļ.

Ļm

For any investment with mean return Ī· and standard deviation of return Ļ

to be competitive, it must lie on this eļ¬cient frontier, i.e. it must satisfy the

relation

Ļ

Ī· ā’ r = Ī²(Ī·m ā’ r), where Ī² = or equivalently (2.19)

Ļm

Ī·ā’r (Ī·m ā’ r)

= .

Ļ Ļm

This is the most important result in the capital asset pricing model. The excess

return of a stock Ī· ā’ r divided by its standard deviation Ļ is supposed constant,

and is called the Sharpe ratio or the market price of risk. The constant Ī² called

the beta of the stock or portfolio and represents the change in the expected

portfolio return for each unit change in the market. It is also the ratio of the

standard deviations of return of the stock and the market. Values of Ī² > 1

indicate a stock that is more variable than the market and tends to have higher

positive and negative returns, whereas values of Ī² < 1 are investments that are

more conservative and less volatile than the market as a whole.

We might attempt to use this model to simplify the assumed structure of

the joint distribution of stock returns. One simple model in which (2.19) holds

is one in which all stocks are linearly related to the market index through a

simple linear regression. In particular, suppose the return from stock i, Ri , is

MINIMUM VARIANCE PORTFOLIOS AND THE CAPITAL ASSET PRICING MODEL.51

related to the return from the market portfolio Rm by

Ļi 2

Ri ā’ r = Ī²i (Rm ā’ r) + Ā²i , where Ī²i = , and Ļi = Ī£ii .

Ļm

The āerrorsā Ā²i are assumed to be random variables, uncorrelated with the

market returns Rm . This model is called the single-index model relating the

returns from the stock Ri and from the market portfolio Rm .It has the merit

that the relationship (2.19) follows immediately.

Taking variance on both sides, we obtain

2 2 2

var(Ri ) = Ī²i var(Rm ) + var(Ā²i ) = Ļi + var(Ā²) > Ļi

2

which contradicts the assumption that var(Ri ) = Ļi . What is the cause of this

contradiction? The relationship (2.19) assumes that the investment lies on the

eļ¬cient frontier. Is this not a suļ¬cient condition for investors to choose this

investment? All that is required for rational investors to choose a particular

stock is that it forms part of a portfolio which does lie on the eļ¬cient frontier.

Is every risk in an eļ¬cient market rewarded with additional expected return?

We cannot expect the market to compensate us with a higher rate of return for

additional risks that could be diversiļ¬ed away. Suppose, for example, we have

two stocks with identical values of Ī². Suppose their returns R1 and R2 both

satisfy a linear regression relation above

Ri ā’ r = Ī²(Rm ā’ r) + Ā²i , i = 1, 2,

where cov(Ā²1 , Ā²2 ) = 0. Consider an investment of equal amounts in both stocks

so that the return is

R1 + R2 Ā²1 + Ā²2

= Ī²(Rm ā’ r) + .

2 2

For simplicity assume that Ļ1 Ā· Ļ2 and notice that the variance of this new

investment is

1

Ī² 2 Ļm + [var(Ā²1 ) + var(Ā²2 )] < var(R2 ).

2

4

52 CHAPTER 2. SOME BASIC THEORY OF FINANCE

The diversiļ¬ed investment consisting of the average of the two results in the

same mean return with smaller variance. Investors should not compensated for

the additional risk in stock 2 above the level that we can achieve by sensible

diversiļ¬cation. In general, by averaging or diversifying, we are able to provide

an investment with the same average return characteristics but smaller variance

than the original stock. We say that the risk (i.e. var(Ā²i )) associated with

stock i which can be diversiļ¬ed away is the speciļ¬c risk, and this risk is not

rewarded with increased expected return. Only the so-called systematic risk Ļi

which cannot by removed by diversiļ¬cation is rewarded with increased expected

return with a relation like (2.19).

The covariance matrix of stock returns is one of the most diļ¬cult parameters

to estimate in practice form historical data. If there are n stocks in a market

(and normally n is large), then there are n(n + 1)/2 elements of Ī£ that need

to be estimated. For example if we assume all stocks in the TSE 300 index

are correlated this results in a total of (300)(301)/2 = 45, 150 parameters

to estimate. We might use historical data to estimate these parameters but

variances and covariances among stocks change over time and it is not clear

over what period of time we can safely use to estimate these parameters. In

spite of its defects, the single index model can be used to provide a simple

approximate form for the covariance matrix Ī£ of the vector of stock returns.

Notice that under the model, assuming uncorrelated random errors Ā²i with

var(Ā²i ) = Ī“i ,

Ri ā’ r = Ī²i (Rm ā’ r) + Ā²i ,

we have

2 22

cov(Ri , Rj ) = Ī²i Ī²j Ļm , i 6= j, var(Ri ) = Ī²i Ļm + Ī“i .

Whereas n stocks would otherwise require a total of n(n + 1)/2 parameters in

the covariance matrix Ī£ of returns, the single index model allows us to reduce

2

this to the n + 1 parameters Ļm , and Ī“i , i = 1, ..., n. There is the disadvantage

MINIMUM VARIANCE PORTFOLIOS AND THE CAPITAL ASSET PRICING MODEL.53

in this formula however that every pair of stocks in the same market must be

positively correlated, a feature that contradicts some observations of real market

returns.

Suppose we use this form Ī£ = Ī²Ī² 0 Ļm + ā, to estimate weights on individual

2

stocks, where ā is the diagonal matrix with the Ī“i along the diagonal and Ī²

In this case Ī£ā’1 = āā’1 +

is the column vector of individual stock betas.

cāā’1 Ī²Ī² 0 āā’1 where

ā’1 1

2 P

= ā’Ļm

P

c= ā’2 22

2 1+ i Ī²i Ļm /Ī“i

Ļm + Ī²i /Ī“i

i

and consequently the conservative investor by (2.14) invests in stock i propor-

tionally to the components of Ī£ā’1 1

X

1

or to + cĪ²i ( Ī²j /Ī“j )

Ī“i j

1

P

or proportional to Ī²i +

cĪ“i ( j Ī²j /Ī“j )

The conditional variance of Ri given the market return Rm is Ī“i . Let us call this

the excess volatility for stock i. Then the weights for the conservative portfolio

are linear in the beta for the stock and the reciprocal of the excess volatility.

The weights in the market portfolio are given by

ā” ā¤ ā” ā¤

1 1

wm = Ī£ā’1 M A ā£ ā¦ = (āā’1 + cāā’1 Ī²Ī² 0 āā’1 )[ 1 Āµ ](M 0 Ī£ā’1 M )ā’1 ā£ ā¦

Ī·p Ī·p

Minimum Variance under Q.

Suppose we wish to ļ¬nd a portfolios of securities which has the smallest possible

variance under the risk neutral distribution Q. For example for a given set of

weights wi (t) representing the number of shares held in security i at time t,

P

deļ¬ne the portfolio Ī (t) = wi (t)Si (t). Recall from Section 2.1 that under

a risk neutral distribution, all stocks have exactly the same expected return

as the risk-free interest rate so the portfolio Ī (t) will have exactly the same

54 CHAPTER 2. SOME BASIC THEORY OF FINANCE

conditional expected rate of return under Q as all the constituent stocks,

X X B(t + 1) B(t + 1)

EQ [Ī (t+1)|Ht ] = wi (t)EQ [Si (t+1)|Ht ] = wi (t) Si (t) = Ī (t).

B(t) B(t)

i i

Since all portfolios have the same conditional expected return under Q, we

might attempt to minimize the (conditional) variance of the portfolio return of

the portfolio. The natural constraint is that the cost of the portfolio is deter-

mined by the amount c(t) that we presently have to invest. We might assume

a constant investment over time, for example c(t) = 1 for all t. Alternatively,

we might wish to study a self-ļ¬nancing portfolio Ī (t), one for which past gains

(or perish the thought, past losses) only are available to pay for the current

portfolio so we neither withdraw from nor add money to the portfolio over its

lifetime. I this case c(t) = Ī (t). We wish to minimise

X

varQ [Ī (t + 1)|Ht ] subject to the constraint wi (t)Si (t) = c(t).

i

As before, the solution is quite easy to obtain, and in fact the weights are

given by the vector

ā ā

w (t)

ā1 ā

ā ā

ā w2 (t) ā

ā ā

ā ā

ā ā

. c(t)

ā ā

Ī£ā’1 S(t).

w(t) = ā ā= 0

ā S (t)Ī£t S(t) t

ā’1

ā .

ā ā

ā ā

ā ā

ā ā

.

ā ā

wn (t)

where Ī£t = varQ (S(t + 1)|Ht ) is the instantaneous conditional covariance

matrix of S(t) under the measure Q. If my objective were to minimize risk under

the Q measure, then this portfolio is optimal for ļ¬xed cost. The conditional

variance of this portfolio is given by

c2 (t)

0

varQ (Ī (t + 1)|Ht ) = w (t)Ī£t w(t) = 0 .

S (t)Ī£ā’1 S(t)

t

MINIMUM VARIANCE PORTFOLIOS AND THE CAPITAL ASSET PRICING MODEL.55

Ī (t+1)ā’Ī (t)

In terms of the portfolio return RĪ (t + 1) = , if the portfolio is

Ī (t)

self-ļ¬nancing so that c(t) = Ī (t), the above relation states that the conditional

variance of the return RĪ (t + 1) given the past is simply

1

varQ (RĪ (t + 1)|Ht ) =

S 0 (t)Ī£ā’1 S(t)

t

which is similar to the form of the variance of the conservative portfolio (2.13).

Similarly, covariances between returns for individual stocks and the return

of the portfolio Ī are given by exactly the same quantity, namely

1

cov(Ri (t + 1), RĪ (t + 1)|Ht ) = .

S 0 (t)Ī£ā’1 S(t)

t

Let us summarize our ļ¬ndings so far. We assume that the conditional co-

variance matrix Ī£t of the vector of stock prices is non-singular. Under the risk

neutral measure, all stocks have exactly the same expected returns equal to the

risk-free rate. There is a unique self-ļ¬nancing minimum-variance portfolio Ī (t)

and all stocks have exactly the same conditional covariance Ī² with Ī . All stocks

have exactly the same regression coeļ¬cient Ī² when we regress on the minimum

variance portfolio.

Are other minimum variance portfolios conditionally uncorrelated with the

portfolio we obtained above. Suppose we deļ¬ne Ī 2 (t) similarly to minimize the

variance subject to the condition that CovQ (Ī 2 (t + 1), Ī (t + 1)|Ht ) = 0. It is

easy to see that this implies that the cost of such a portfolio at the beginning

of each period is 0. This means that in this new portfolio, there is a perfect

balance between long and short stocks, or that the value of the long and short

stocks are equal.

The above analysis assumes that our objective is minimizing the variance

of the portfolio under the risk-neutral distribution Q. Two objections could be

made. First we argued earlier that the performance of an investment should be

made through the returns , not through the stock prices. Since under the risk

neutral measure Q, the expected return from every stock is the risk-free rate of

56 CHAPTER 2. SOME BASIC THEORY OF FINANCE

return, we are left with the problem of minimizing the variance of the portfolio

return. By our earlier analysis, this is achieved when the proportion of our

total investment at each time period in stock i is chosen as the corresponding

Ī£ā’1 1

component of the vector where now Ī£t is the conditional covariance

t

10 Ī£ā’1 1

t

matrix of the stock returns. This may appear to be a diļ¬erent criterion and

hence a diļ¬erent solution, but because at each time step the stock price is a linear

function of the return Si (t + 1) = Si (t)(1 + Ri (t + 1)) the variance minimizing

portfolios are essentially the same. There is another objection however to an

analysis in the risk-neutral world of Q. This is a distribution which determines

the value of options in order to avoid arbitrage in the system, not the actual

distribution of stock prices. It is not clear what the relationship is between

the covariance matrix of stock prices under the actual historical distribution

and the risk neutral distribution Q, but observations seem to indicate a very

considerable diļ¬erence. Moreover, if this diļ¬erence is large, there is very little

information available for estimating the parameters of the covariance matrix

under Q, since historical data on the ļ¬‚uctuations of stock prices will be of

doubtful relevance.

Entropy: choosing a Q measure

Maximum Entropy

In 1948 in a fundamental paper on the transmission of information, C. E. Shan-

non proposed the following idea of entropy. The entropy of a distribution at-

tempts to measure the expected number of steps required to determine a given

outcome of a random variable with a given distribution when using a simple

binary poll. For example suppose that a random variable X has distribution

ENTROPY: CHOOSING A Q MEASURE 57

given by

x 0 1 2

P [X = x] .25 .25 .5

if we ask ļ¬rst whether the random variable is ā„ 2 and

In this case,

then, provided the answer is no, if it is ā„ 1, the expected number of queries to

ascertain the value of the random variable is 1+1(1/2) = 1.5. There is no more

eļ¬cient scheme for designing this binary poll in this case so we will take 1.5 to

be a measure of entropy of the distribution of X. For a discrete distribution,

such that P [X = x] = p(x), the entropy may be deļ¬ned to be

X

H(p) = E{ā’ ln(p(X))} = ā’ p(x) ln(p(x)).

x

More generally we deļ¬ne the entropy of an arbitrary distribution through the

form for a discrete distribution. If P is a probability measure (see the appen-

dix),

X

H(P ) = sup{ā’ P (Ei ) ln(P (Ei ))}

where the supremum is taken over all ļ¬nite partitions (Ei } of the space.

In the case of the above distribution, if we were to replace the natural log-

arithm by the log base 2, (ln and log2 diļ¬er only by a scale factor and are

therefore the corresponding measures of entropy are equivalent up a constant

P

multiple) notice that ā’ x p(x) log2 (p(x)) = .5(1) + .5(2) = 1.5, so this formula

correctly measures the diļ¬culty in ascertaining a random variable from a se-

quence of questions with yes-no or binary answers. This is true in general. The

complexity of a distribution may be measured by the expected number of ques-

tions in a binary poll to determine the value of a random variable having that

distribution, and such a measure results in the entropy H(p) of the distribution.

Many statistical distributions have an interpretation in terms of maximizing

entropy and it is often remarkable how well the maximum entropy principle re-

produces observed distributions. For example, suppose we know that a discrete

random variable takes values on a certain set of n points. What distribution p

58 CHAPTER 2. SOME BASIC THEORY OF FINANCE

on this set maximizes the entropy H(p)? First notice that if p is uniform on

P1 1

n points, p(x) = 1/n for all x and so the entropy is ā’ x n ln( n ) = ln(n).

Now consider the problem of maximizing the entropy H(p) for any distribution

on n points subject to the constraint that the probabilities add to one. As in

P P

(2.10), the Lagrangian for this problem is ā’ x p(x) ln(p(x)) ā’ Ī»{ x p(x) ā’ 1}

where Ī» is a Lagrange multiplier. Upon diļ¬erentiating with respect to p(x) for

each x, we obtain ā’ ln(p(x)) ā’ 1 ā’ Ī» = 0 or p(x) = eā’(1+Ī») . The probabilities

evidently do not depend on x and the distribution is thus uniform. Applying

the constraint that the sum of the probabilities is one results in p(x) = 1/n

for all x. The discrete distribution on n points which has maximum entropy is

the uniform distribution. What if we repeat this analysis using additional con-

straints, for example on the moments of the distribution? Suppose for example

that we require that the mean of the distribution is some ļ¬xed constant Āµ and

the variance ļ¬xed at Ļ 2 . The problem is similar to that treated above but with

two more terms in the Lagrangian for each of the additional constraints. The

Lagrangian becomes

X X X X

x2 p(x)ā’Āµ2 ā’Ļ 2 }

ā’ p(x) ln(p(x))ā’Ī»1 { p(x)ā’1}ā’Ī»2 { xp(x)ā’Āµ}ā’Ī»3 {

x x x

whereupon setting the derivative with respect to p(x) equal to zero and ap-

plying the constraints we obtain

p(x) = exp{ā’Ī»1 ā’ Ī»2 x ā’ Ī»3 x2 },

with constants Ī»1 , Ī»2 , Ī»3 chosen to satisfy the three constraints. Since the ex-

ponent is a quadratic function of x, this is analogous to the normal distribution

except that we have required that it be supported on a discrete set of points x.

With more points, positioned more closely together, the distribution becomes

closer to the normal. Let us call such a distribution the discrete normal dis-

tribution. For a simple example, suppose that we wish to use the maximum

entropy principle to approximate the distribution of the sum of the values on

ENTROPY: CHOOSING A Q MEASURE 59

0.18

0.16

0.14

0.12

0.1

probability

0.08

0.06

0.04

0.02

0

2 3 4 5 6 7 8 9 10 11 12

value

Figure 2.5: A discrete analogue of the normal distribution compared with the

distribution of the sum of the values on two dice.

two dice. In this case the actual distribution is known to us as well as the mean

and variance E(X) = 7, var(X) = 35/6;

2 3 4 5 6 7 8 9 10 11 12

x

1 2 3 4 5 6 5 4 3 2 1

P (X = x) 36 36 36 36 36 36 36 36 36 36 36

The maximum entropy distribution on these same points constrained to have

the same mean and variance is very similar to this, the actual distribution. This

can been seen in Figure 2.5.

[FIGURE 2.5 ABOUT HERE]

In fact if we drop the requirement that the distribution is discrete, or equiv-

alently take a limit with an increasing number of discrete points closer and

closer together, the same kind of argument shows that the maximum entropy

distribution subject to a constraint on the mean and the variance is the normal

distribution. So at least two well-known distributions arise out of maximum

60 CHAPTER 2. SOME BASIC THEORY OF FINANCE

entropy considerations. The maximum entropy distribution on a discrete set

of points is the uniform distribution. The maximum entropy subject to a con-

straint on the mean and the variance is a (discrete) normal distribution. There

are many other examples as well. In fact most common distributions in statis-

tics have an interpretation as a maximum entropy distribution subject to some

constraints.

Entropy has a number of properties that one would expect of a measure of

the information content in a random variable. It is non-negative, and can in

usual circumstances be inļ¬nite. We expect that the information in a function

of X , say g(X), is less than or equal to the information in X itself, equal if

the function is one to one (which means in eļ¬ect we can determine X from

the value of g(X)). Entropy is a property of a distribution, not of a random

variable. Nevertheless it is useful to be able to abuse the notation used earlier

by referring to H(X) as the entropy of the distribution of X. Then we have the

following properties

Proposition 7 H(X) ā„ 0

Proposition 8 H(g(X)) Ā· H(X) for any function g(x)..

The information or uncertainty in two random variables is clearly greater

than that in one. The deļ¬nition of entropy is deļ¬ned in the same fashion as

before, for discrete random variables (X, Y ),

H(X, Y ) = ā’E(ln p(X, Y ))

where p(x, y) is the joint probability function

p(x, y) = P [X = x, Y = y].

If the two random variables are independent, then we expect that the uncer-

tainty should add. If they are dependent, then the entropy of the pair (X, Y )

is less than the sum of the individual entropies.

ENTROPY: CHOOSING A Q MEASURE 61

Proposition 9 H(X, Y ) Ā· H(X) + H(Y ) with equality if and only if X and Y

are independent.

Let us now use the principle of maximum entropy to address an eminently

practical problem, one of altering a distribution to accommodate a known mean

value. Suppose we are interested in determining a risk-neutral distribution

for pricing options at maturity T. Theorem 1 tells us that if there is to be no

arbitrage, our distribution or measure Q must satisfy a relation of the form

EQ (eā’rT ST ) = S0

where r is the continuously compounded interest rate, S0 is the initial (present)

value of the underlying stock, and ST is its value at maturity. Let us also

suppose that we constraint the variance of the future stock price under the

measure Q so that

varQ (ST ) = Ļ 2 T.

Then from our earlier discussion, the maximum entropy distribution under

constraints on the mean and variance is the normal distribution so that the

probability density function of ST is

(s ā’ erT S0 )2

1

f (s) = ā }.

exp{ā’

2Ļ2 T

Ļ 2ĻT

If we wished a maximum entropy distribution which is compatible with a

number of option prices, then we should impose these option prices as additional

constraints. Again suppose the current time t = 0 and we know the prices

Pi , i = 1, ..., n of n diļ¬erent call options available on the market, all on the same

security and with the same maturity T but with diļ¬erent strike prices Ki . The

distribution Q we assign to ST must satisfy the constraints

E(eā’rT (ST ā’ Ki )+ ) = Pi , i = 1, ..., n (2.20)

as well as the martingale constraint

E(eā’rT ST ) = S0 . (2.21)

62 CHAPTER 2. SOME BASIC THEORY OF FINANCE

Once again introducing Lagrange multipliers, the probability density function

of ST will take the form

n

X

ā’rT

Ī»i (s ā’ Ki )+ + Ī»0 s}

f (s) = k exp{e

i=1

where the parameters Ī»0 , ..., Ī»n are chosen to satisfy the constraints (2.20) and

(2.21) and k so that the function integrates to 1. When ļ¬t to real option price

data, these distributions typically resemble a normal density, usually however

with some negative skewness and excess kurtosis. See for example Figure XXX.

There are alsoāsawtoothā like appendages with teeth corresponding to each of

the n options. Note too this density is strictly positive at the value s = 0,

a feature that we may or may not wish to have. Because of the āteethā, a

smoother version of the density is often used, one which may not perfectly

reproduce option prices but is nevertheless appears to be more natural.

Minimum Cross-Entropy

Normally market information does not completely determine the risk-neutral

measure Q . We will argue that while market data on derivative prices rather

than historical data should determine the Q measure, historical asset prices

can be used to ļ¬ll in the information that is not dictated by no-arbitrage con-

siderations. In order to relate the real world to the risk-free world, we need

either suļ¬cient market data to completely describe a risk-neutral measure Q

(such a model is called a complete market) or we need to limit our candidate

class of Q measures somewhat. We may either deļ¬ne the joint distributions of

the stock prices or their returns, since from one we can pass to the other. For

convenience, suppose we describe the joint distribution of the returns process.

The conditions we impose on the martingale measure are the following;

1. Under Q, each normalized stock price Sj (t)/Bt and derivative price

Vt /Bt forms a martingale. Equivalently, EQ [Si (t+1)|Ht ] = Si (t)(1+r(t))

ENTROPY: CHOOSING A Q MEASURE 63

where r(t) is the risk free interest rate over the interval (t, t + 1). (Recall

that this risk-free interest rate r(t) is deļ¬ned by the equation B(t + 1) =

(1 + r(t))B(t).)

2. Q is a probability measure.

A slight revision of notation is necessary here. We will build our joint distri-

butions conditionally on the past and if P denotes the joint distribution stock

prices S(1), S(2), ...S(T ) over the whole period of observation 0 < t < T then

Pt+1 denotes the conditional distribution of S(t + 1) given Ht . Let us denote

the conditional moment generating function of the vector S(t + 1) under the

measure Pt+1 by

X

mt (u) = EP [exp(u0 S(t + 1)|Ht ] = EP [exp( ui Si (t + 1))|Ht ]

i

We implicitly assume, of course, that this moment generating function exists.

Suppose, for some vector of parameters Ī· we choose Qt+1 to be the exponential

tilt of Pt+1 , i.e.

exp(Ī· 0 s)

dQt+1 (s) = dPt+1 (s)

mt (Ī·)

The division by mt (Ī·) is necessary to ensure that Qt+1 is a probability measure.

Why transform a density by multiplying by an exponential in this way?

There are many reasons for such a transformation. Exponential families of dis-

tributions are built in exactly this fashion and enjoy properties of suļ¬ciency,

completeness and ease of estimation. This exponential tilt resulted from maxi-

mizing entropy subject to certain constraints on the distribution. But we also

argue that the measure Q is the probability measure which is closest to P in

a certain sense while still satisfying the required moment constraint. We ļ¬rst

introduce cross-entropy which underlies considerable theory in Statistics and

elsewhere in Science.

64 CHAPTER 2. SOME BASIC THEORY OF FINANCE

Cross Entropy

Consider two probability measures P and Q on the same space. Then the

cross entropy or Kullbach-Leibler ādistanceā between the two measures is given

by

X Q(Ei )

H(Q, P ) = sup Q(Ei ) log

P (Ei )

{Ei }

where the supremum is over all ļ¬nite partitions {Ei } of the probability space.

Various properties are immediate.

Proposition 10 H(Q, P ) ā„ 0 with equality if and only if P and Q are iden-

tical.

If Q is absolutely continuous with respect to P , that is if there is some

density function f (x) such that

Z

f (x)dP for all E

Q(E) =

E

then provided that f is smooth, we can also write

dQ

H(Q, P ) = EQ log( ).

dP

If Q is not absolutely continuous with respect to P then the cross entropy

H(Q, P ) is inļ¬nite. We should also remark that the cross entropy is not really

a distance in the usual sense (although we used the term ādistanceā in reference

to it) because in general H(Q, P ) 6= H(P |Q). For a ļ¬nite probability space,

there is an easy relationship between entropy and cross entropy given by the

following proposition. In eļ¬ect the result tells us that maximizing entropy H(Q)

is equivalent to minimizing the cross-entropy H(Q, P ) where P is the uniform

distribution.

Proposition 11 If the probability space has a ļ¬nite number n points, and P

denotes the uniform distribution on these n points, then for any other probability

measure Q,

H(Q, P ) = n ā’ H(Q)

ENTROPY: CHOOSING A Q MEASURE 65

Now the following result asserts that the probability measure Q which is

closest to P in the sense of cross-entropy but satisļ¬es a constraint on its mean

is generated by a so-called āexponential tiltā of the distribution of P.

Theorem 12 : Minimizing cross-entropy.

Let f (X) be a vector valued function f (X) = (f1 (X), f2 (X), ..., fn (X)) and

Āµ = (Āµ1 , ..., Āµn ). Consider the problem

min H(Q, P )

Q

subject to the constraint EQ (fi (X)) = Āµi , i = 1, ..., n. Then the solution, if it

exists, is given by

Pn

exp(Ī· 0 f (X)) exp( i=1 Ī·i fi (X))

dQ = dP =

m(Ī·) m(Ī·)

Pn ā‚m

where m(Ī·) = EP [exp( i=1 Ī·i fi (X))] and Ī· is chosen so that = Āµm(Ī·).

ā‚Ī·i

The proof of this result, in the case of a discrete distribution P is a straight-

forward use of Lagrange multipliers (see Lemma 3). We leave it as a problem

at the end of the chapter.

Now let us return to the constraints on the vector of stock prices. In order

that the discounted stock price forms a martingale under the Q measure, we

require that EQ [S(t + 1)|Ht ] = (1 + r(t))S(t). This is achieved if we deļ¬ne Q

such that for any event A ā Ht ,

Z

Zt dP where

Q(A) =

A

s

X

0

Ī·t (St+1 ā’ St )) (2.22)

Zs = kt exp(

t=1

where kt are Ht measurable random variables chosen so that Zt forms a mar-

tingale

E(Zt+1 |Ht ) = Zt .

66 CHAPTER 2. SOME BASIC THEORY OF FINANCE

Theorem 9 shows that this exponentially tilted distribution has the property

of being the closest to the original measure P while satisfying the condition

that the normalized sequence of stock prices forms a martingale.

There is a considerable literature exploring the links between entropy and

risk-neutral valuation of derivatives. See for example Gerber and Shiu (1994),

Avellaneda et. al (1997), Gulko(1998), Samperi (1998). In a complete or

incomplete market, risk-neutral valuation may be carried out using a martingale

measure which maximizes entropy or minimizes cross-entropy subject to some

natural constraints including the martingale constraint. For example it is easy

to show that when interest rates r are constant, Q is the risk-neutral measure

for pricing derivatives on a stock with stock price process St , t = 0, 1, ... if

and only if it is the probability measure minimizing H(Q, P ) subject to the

martingale constraint

1

(2.23)

St = EQ [ ĀÆ St+1 ].

1+r

There is a continuous time analogue of (2.22) as well which we can anticipate

by inspecting the form of the solution. Suppose that St denotes the stock price

at time t where we now allow t to vary continuously in time. which we will

discuss later but (2.22) can be used to anticipate it. Then an analogue of (2.22)

could be written formally as

Z t

0

Ī·t dSt ā’ gt )

Zs = exp(

0

where both processes Ī·t and gt are āpredictableā which loosely means that

they are determined in advance of observing the increment St , St+āt . Then the

dQ

process Zs is the analogue of the Radon-Nikodym derivative of the processes

dP

restricted to the time interval 0 Ā· t Ā· s. For a more formal deļ¬nition, as well as

an explanation of how we should interpret the integral, see the appendix. This

process Zs is, both in discrete and continuous time, a martingale.

MODELS IN CONTINUOUS TIME 67

Wiener Process

3

2.5

2

1.5

W(t)

1

0.5

0

-0.5

-1

0 1 2 3 4 5 6 7 8 9 10

t

Figure 2.6: A sample path of the Wiener process

Models in Continuous Time

We begin with some oversimpliļ¬ed rules of stochastic calculus which can be

omitted by those with a background in Brownian motion and diļ¬usion. First,

we deļ¬ne a stochastic process Wt called the standard Brownian motion or

Wiener process having the following properties;

1. For each h > 0, the increment W (t+h)ā’W (t) has a N (0, h) distribution

and is independent of all preceding increments W (u) ā’ W (v), t > u > v >

0.

2. W (0 ) = 0 .

[FIGURE 2.6 ABOUT HERE]

The fact that such a process exists is by no means easy to see. It has been an

important part of the literature in Physics, Probability and Finance at least since

the papers of Bachelier and Einstein, about 100 years ago. A Brownian motion

process also has some interesting and remarkable theoretical properties; it is

continuous with probability one but the probability that the process has ļ¬nite

68 CHAPTER 2. SOME BASIC THEORY OF FINANCE

Random Walk

4

3

2

1

Sn

0

-1

-2

-3

0 2 4 6 8 10 12 14 16 18 20

n

Figure 2.7: A sample path of a Random Walk

variation in any interval is 0. With probability one it is nowhere diļ¬erentiable.

Of course one might ask how a process with such apparently bizarre properties

can be used to approximate real-world phenomena, where we expect functions

to be built either from continuous and diļ¬erentiable segments or jumps in the

process. The answer is that a very wide class of functions constructed from those

that are quite well-behaved (e.g. step functions) and that have independent

increments converge as the scale on which they move is reļ¬ned either to a

Brownian motion process or to a process deļ¬ned as an integral with respect to a

Brownian motion process and so this is a useful approximation to a broad range

of continuous time processes. For example, consider a random walk process

Pn

i=1 Xi where the random variables Xi are independent identically

Sn =

distributed with expected value E(Xi ) = 0 and var(Xi ) = 1. Suppose we plot

the graph of this random walk (n, Sn ) as below. Notice that we have linearly

interpolated the graph so that the function is deļ¬ned for all n, whether integer

or not.

[FIGURE 2.7 ABOUT HERE]

MODELS IN CONTINUOUS TIME 69

Now if we increase the sample size and decrease the scale appropriately on

both axes, the result is, in the limit, a Brownian motion process. The vertical

ā

scale is to be decreased by a factor 1/ n and the horizontal scale by a factor

nā’1 . The theorem concludes that the sequence of processes

1

Yn (t) = ā Snt

n

converges weakly to a standard Brownian motion process as n ā’ ā. In practice

this means that a process with independent stationary increments tends to look

like a Brownian motion process. As we shall see, there is also a wide variety

of non-stationary processes that can be constructed from the Brownian motion

process by integration. Let us use the above limiting result to render some

of the properties of the Brownian motion more plausible, since a serious proof

is beyond our scope. Consider the question of continuity, for example. Since

Pn(t+h)

1

|Yn (t + h) ā’ Yn (t)| ā | ān i=nt Xi | and this is the absolute value of an

asymptotically normally(0, h) random variable by the central limit theorem, it

is plausible that the limit as h ā’ 0 is zero so the function is continuous at t.

On the other hand note that

n(t+h)

11 X

Yn (t + h) ā’ Yn (t)

āā Xi

h h n i=nt

should by analogy behave like hā’1 times a N (0, h) random variable which blows

up as h ā’ 0 so it would appear that the derivative at t does not exist. To

obtain the total variation of the process in the interval [t, t + h] , consider the

lengths of the segments in this interval, i.e.

n(t+h)

1X

ā |Xi |

n i=nt

Pn(t+h)

1

|Xi |

and notice that since the law of large numbers implies that i=nt

nh

ā

converges to a positive constant, namely E|Xi |, if we multiply by the

nh

limit must be inļ¬nite, so the total variation of the Brownian motion process is

inļ¬nite.

70 CHAPTER 2. SOME BASIC THEORY OF FINANCE

Continuous time process are usually built one small increment at a time

and deļ¬ned to be the limit as the size of the time increment is reduced to zero.

Let us consider for example how we might deļ¬ne a stochastic (Ito) integral of

RT

the form 0 h(t)dWt . An approximating sum takes the form

Z nā’1

X

T

h(t)dWt ā h(ti )(W (ti+1 ) ā’ W (ti )), 0 = t0 < t1 < ... < tn = T.

0 i=0

Note that the function h(t) is evaluated at the left hand end-point of the in-

tervals [ti , ti+1 ], and this is characteristic of the Ito calculus, and an important

feature distinguishing it from the usual Riemann calculus studied in undergrad-

uate mathematics courses. There are some simple reasons why evaluating the

function at the left hand end-point is necessary for stochastic models in ļ¬nance.

For example let us suppose that the function h(t) measures how many shares

of a stock we possess and W (t) is the price of one share of stock at time t.

It is clear that we cannot predict precisely future stock prices and our decision

about investment over a possibly short time interval [ti , ti+1 ] must be made

at the beginning of this interval, not at the end or in the middle. Second, in

the case of a Brownian motion process W (t), it makes a diļ¬erence where in

the interval [ti , ti+1 ] we evaluate the function h to approximate the integral,

whereas it makes no diļ¬erence for Riemann integrals. As we reļ¬ne the parti-

Pnā’1

tion of the interval, the approximating sums i=0 h(ti+1 )(W (ti+1 ) ā’ W (ti )),

for example, approach a completely diļ¬erent limit. This diļ¬erence is essentially

due to the fact that W (t), unlike those functions studied before in calculus, is

of inļ¬nite variation. As a consequence, there are other important diļ¬erences in

the Ito calculus. Let us suppose that the increment dW is used to denote

small increments W (ti+1 ) ā’ W (ti ) involved in the construction of the integral.

If we denote the interval of time ti+1 ā’ ti by dt, we can loosely assert that dW

has the normal distribution with mean 0 and variance dt. If we add up a large

number of independent such increments, since the variances add, the sum has

variance the sum of the values dt and standard deviation the square root. Very

MODELS IN CONTINUOUS TIME 71

roughly, we can assess the size of dW since its standard deviation is (dt)1/2 .

Now consider deļ¬ning a process as a function both of the Brownian motion and

of time, say Vt = g(Wt , t). If Wt represented the price of a stock or a bond,

Vt might be the price of a derivative on this stock or bond. Expanding the

increment dV using a Taylor series expansion gives

ā‚2 dW 2

ā‚ ā‚

(2.24)

dVt = g(Wt , t)dW + g(Wt , t) + g(Wt , t)dt

ā‚W 2

ā‚W 2 ā‚t

+ (stuļ¬) Ć— (dW )3 + (more stuļ¬) Ć— (dt)(dW )2 + ....

is normal with mean 0 and standard deviation (dt)1/2 and

Loosely, dW

so dW is non-negligible compared with dt as dt ā’ 0. We can deļ¬ne each of the

diļ¬erentials dW and dt essentially by reference to the result when we integrate

both sides of the equation. If I were to write an equation in diļ¬erential form

dXt = h(t)dWt

then this only has real meaning through its integrated version

Z t

Xt = X0 + h(t)dWt .

0

What about the terms involving (dW )2 ? What meaning should we assign to a

R P

term like h(t)(dW )2 ? Consider the approximating function h(ti )(W (ti+1 )ā’

W (ti ))2 . Notice that, at least in the case that the function h is non-random we

are adding up independent random variables h(ti )(W (ti+1 ) ā’ W (ti ))2 each with

expected value h(ti )(ti+1 ā’ ti ) and when we add up these quantities the limit

R

is h(t)dt by the law of large numbers. Roughly speaking, as diļ¬erentials, we

should interpret (dW )2 as dt because that is the way it acts in an integral.

Subsequent terms such as (dW )3 or (dt)(dW )2 are all o(dt), i.e. they all

approach 0 faster than does dt as dt ā’ 0. So ļ¬nally substituting for (dW )2 in

2.24 and ignoring all terms that are o(dt), we obtain a simple version of Itoā™s

lemma

72 CHAPTER 2. SOME BASIC THEORY OF FINANCE

1 ā‚2

ā‚ ā‚

g(Wt , t)dW + {

dg(Wt , t) = g(Wt , t) + g(Wt , t)}dt.

2

ā‚W 2 ā‚W ā‚t

This rule results, for example, when we put g(Wt , t) = Wt2 in

d(Wt2 ) = 2Wt dWt + dt

or on integrating both sides and rearranging,

Zb Z

1b

1 2 2

Wt dWt = (Wb ā’ Wa ) ā’ dt.

2 2a

a

Rb

The term a dt above is what distinguishes the Ito calculus from the Riemann

calculus, and is a consequence of the nature of the Brownian motion process, a

continuous function of inļ¬nite variation.

There is one more property of the stochastic integral that makes it a valuable

tool in the construction of models in ļ¬nance, and that is that a stochastic integral

with respect to a Brownian motion process is always a martingale. To see this,

note that in an approximating sum

Z nā’1

X

T

h(t)dWt ā h(ti )(W (ti+1 ) ā’ W (ti ))

0 i=0

each of the summands has conditional expectation 0 given the past, i.e.

E[h(ti )(W (ti+1 ) ā’ W (ti ))|Hti ] = h(ti )E[(W (ti+1 ) ā’ W (ti ))|Hti ] = 0

since the Brownian increments have mean 0 given the past and since h(t) is

measurable with respect to Ht .

We begin with an attempt to construct the model for an Ito process or dif-

fusion process in continuous time. We construct the price process one increment

at a time and it seems reasonable to expect that both the mean and the vari-

ance of the increment in price may depend on the current price but does not

depend on the process before it arrived at that price. This is a loose description

of a Markov property. The conditional distribution of the future of the process

MODELS IN CONTINUOUS TIME 73

depends only on the current time t and the current price of the process. Let us

suppose in addition that the increments in the process are, conditional on the

past, normally distributed. Thus we assume that for small values of h, con-

ditional on the current time t and the current value of the process Xt , the

increment Xt+h ā’ Xt can be generated from a normal distribution with mean

a(Xt , t)h and with variance Ļ 2 (Xt , t)h for some functions a and Ļ2 called the

drift and diļ¬usion coeļ¬cients respectively. Such a normal random variable can

be formally written as a(Xt , t )dt+ Ļ 2 (Xt , t)dWt . Since we could express XT as

P

an initial price X0 plus the sum of such increments, XT = X0 + i (Xti+1 ā’Xti ).

The single most important model of this type is called the Geometric Brown-

ian motion or Black-Scholes model. Since the actual value of stock, like the

value of a currency or virtually any other asset is largely artiļ¬cial, depending on

such things as the number of shares issued, it is reasonable to suppose that the

changes in a stock price should be modeled relative to the current price. For

example rather than model the increments, it is perhaps more reasonable to

model the relative change in the process. The simplest such model of this type

is one in which both the mean and the standard deviation of the increment in

the price are linear multiples of price itself; viz. dXt is approximately nor-

mally distributed with mean aXt dt and variance Ļ 2 Xt dt. In terms of stochastic

2

diļ¬erentials, we assume that

(2.25)

dXt = aXt dt + ĻXt dWt .

Now consider the relative return from such a process over the increment dYt =

dXt /Xt . Putting Yt = g(Xt ) = ln(Xt ) note that analogous to our derivation of

Itoā™s lemma

1

dg(Xt ) = g 0 (Xt )dXt + g 00 (Xt )(dX)2 + ...

2

1 1 22

{aXt dt + ĻXt dWt .} ā’

= 2 Ļ Xt dt

Xt 2Xt

Ļ2

= (a ā’ )dt + ĻdWt .

2

74 CHAPTER 2. SOME BASIC THEORY OF FINANCE

which is a description of a general Brownian motion process, a process with

Ļ2

increments dYt that are normally distributed with mean (a ā’ and with

2 )dt

variance Ļ 2 dt. This process satisfying dXt = aXt dt + ĻXt dWt is called the

Geometric Brownian motion process (because it can be written in the form

Xt = eYt for a Brownian motion process Yt ) or a Black-Scholes model.

Many of the continuous time models used in ļ¬nance are described as Markov

diļ¬usions or Ito processes which permits the mean and the variance of the

increments to depend more generally on the present value of the process and

the time. The integral version of this relation is of the form

Z Z

T T

XT = X0 + a(Xt , t)dt + Ļ(Xt , t)dWt .

0 0

We often write such an equation with diļ¬erential notation,

(2.26)

dXt = a(Xt , t)dt + Ļ(Xt , t)dWt .

but its meaning should always be sought in the above integral form. The co-

eļ¬cients a(Xt , t) and Ļ(Xt , t) vary with the choice of model. As usual, we

interpret 2.26 as meaning that a small increment in the process, say dXt =

Xt+h ā’ Xt (h very small) is approximately distributed according to a normal

distribution with conditional mean a(Xt , t)dt and conditional variance given by

Ļ 2 (Xt , t)var(dWt ) = Ļ 2 (Xt , t)dt. Here the mean and variance are conditional

on Ht , the history of the process Xt up to time t.

Various choices for the functions a(Xt , t), Ļ(Xt , t) are possible. For the

Black-Scholes model or geometric Brownian motion, a(Xt , t) = aXt and Ļ(Xt , t) =

ĻXt for constant drift and volatility parameters a, Ļ. The Cox-Ingersoll-Ross

model, used to model spot interest rates, corresponds to a(Xt , t) = A(b ā’ Xt )

ā

and Ļ(Xt , t) = c Xt for constants A, b, c. The Vasicek model, also a model for

interest rates, has a(Xt , t) = A(b ā’ Xt ) and Ļ(Xt , t) = c. There is a large num-

ber of models for most continuous time processes observed in ļ¬nance which can

be written in the form 2.26. So called multi-factor models are of similar form

MODELS IN CONTINUOUS TIME 75

where Xt is a vector of ļ¬nancial time series and the coeļ¬cient functions a(Xt , t)

is vector valued, Ļ(Xt , t) is replaced by a matrix-valued function and dWt is

interpreted as a vector of independent Brownian motion processes. For techni-

cal conditions on the coeļ¬cients under which a solution to 2.26 is guaranteed

to exist and be unique, see Karatzas and Shreve, sections 5.2, 5.3.

As with any diļ¬erential equation there may be initial or boundary condi-

tions applied to 2.26 that restrict the choice of possible solutions. Solutions

to the above equation are diļ¬cult to arrive at, and it is often even more diļ¬-

cult to obtain distributional properties of them. Among the key tools are the

Kolmogorov diļ¬erential equations (see Cox and Miller, p. 215). Consider the

transition probability kernel

p(s, z, t, x) = P [Xt = x|Xs = z]

in the case of a discrete Markov Chain. If the Markov chain is continuous (as it

is in the case of diļ¬usions), that is if the conditional distribution of Xt given Xs

is absolutely continuous with respect to Lebesgue measure, then we can deļ¬ne

p(s, z, t, x) to be the conditional probability density function of Xt given Xs = z.

The two equations, for a diļ¬usion of the above form, are:

Kolmogorovā™s backward equation

ā‚2

ā‚ ā‚ 12

p = ā’a(z, s) p ā’ Ļ (z, s) 2 p (2.27)

ā‚s ā‚z 2 ā‚z

and the forward equation

1 ā‚2 2

ā‚ ā‚

p = ā’ (a(x, t)p) + (2.28)

(Ļ (x, t)p)

2 ā‚x2

ńņš. 2 |