ñòð. 2 |

Get any book for free on: www.Abika.com

28

CHAOTIC LOGIC

To be extremely rough about it, one might suppose that level 1 corresponds to lines. Then

level 2 might correspond to simple geometrical shapes, level 3 might correspond to complex

geometrical shapes, level 4 might correspond to simple recognizable objects or parts of

recognizable objects, level 5 might correspond to complex recognizable objects, and level 6

might correspond to whole scenes. To say that level 4 processes recognize patterns in the output

of level 3 processes is to say that simple recognizable objects are constructed out of complex

geometrical shapes, rather than directly out of lines or simple geometrical shapes. Each level 4

process is the parent, the controller, of those level 3 nodes that correspond to those complex

geometrical shapes which make up the simple object which it represents. And it is the child, the

controlee, of at least one of the level 5 nodes that corresponds to a complex object of which it is

a part (or perhaps even of one of the level 6 nodes describing a scene of which it is a part -- level

crossing like this can happen, so long as it is not the rule).

My favorite way of illustrating this multilevel control structure is to mention the three-level

"pyramidal" vision processing parallel computer developed by Levitan and his colleages at the

University of Massachusetts. The bottom level deals with sensory data and with low-level

processing such as segmentation into components. The intermediate level takes care of grouping,

shape detection, and so forth; and the top level processes this information "symbolically",

constructing an overall interpretation of the scene. The base level is a 512X512 square array of

processors each doing exactly the same thing to different parts of the image; and the middle level

is composed of a 64X64 square array of relatively powerful processors, each doing exactly the

same thing to different parts of the base-level array. Finally, the top level contains 64 very

powerful processors, each one operating independently according to LISP programs. The

intermediate level may also be augmented by additional connections. This three-level perceptual

hierarchy appears be be an extremely effective approach to computer vision.

That orders are passed down the perceptual hierarchy was one of the biggest insights of the

Gestalt psychologists. Their experiments (Kohler, 1975) showed that we look for certain

configurations in our visual input. We look for those objects that we expect to see, and we look

for those shapes that we are used to seeing. If a level 5 process corresponds to an expected

object, then it will tell its children to look for the parts corresponding to that object, and its

children will tell their children to look for the complex geometrical forms making up the parts to

which they refer, et cetera.

3.1.2. Motor Movements

In its motor control aspect, this multilevel control network serves to send actions from the

abstract level to the concrete level. Again extremely roughly, say level 1 represents muscle

movements, level 2 represents simple combinations of muscle movements, level 3 represents

medium-complexity combinations of muscle movements, and level 4 represents complex

combinations of movements such as raising an arm or kicking a ball. Then when a level 4

process gives an instruction to raise an arm, it gives instructions to its subservient level 3

processes, which then give instructions to their subservient level 2 processes, which given

instructions to level 1 processes, which finally instruct the muscles on what to do in order to kick

the ball. This sort of control moves down the network, but of course all complex motions involve

feedback, so that level k processes are monitoring how well their level k-1 processes are doing

Get any book for free on: www.Abika.com

29

CHAOTIC LOGIC

their jobs and adjusting their instructions accordingly. Feedback corresponds to control moving

up the network.

In a less abstract, more practically-oriented language, Bernstein (see Whiting, 1984) has given

a closely related analysis of motor control. And a very similar hierarchical model of perception

and motor control has been given by Jason Brown (1988), under the name of "microgenesis." His

idea is that lower levels of the hierarchy correspond to older, evolutionarily prior forms of

perception and control.

Let us sum up. The multilevel control methodology, in itself, has nothing to do with patterns.

It is a verysimple and general way of structuring perception and action: subprocesses within

subprocesses within subprocesses, each subprocess giving orders to and receiving feedback from

its subsidiaries. In this general sense, the idea that the mind contains a multilevel control

hierarchy is extremely noncontroversial. But psychological multilevel control networks have

one important additional property. They are postulated to deal with questions of pattern. As in

the visual system, the processes on level N are hypothesized to recognize patterns in the output

of the processes on level N-1, and to instruct these processes in certain patterns of behavior. It

is pattern which is passed between the different levels of the hierarchy.

3.1.3. Genetic Programming

Finally, there is the question of how an effective multilevel control network could ever come

about. As there is no "master programmer" determining which control networks will work better

for which tasks, the only way for a control network to emerge is via directed trial and error.

And in this context, the only natural method of trial and error is the one known as genetic

optimization or genetic programming. These fancy words mean simply that

1) subnetworks of the control network which seem to be working ineffectively are randomly

varied

2) subnetworks of the control network which seem to be working ineffectively are a) swapped

with one another, or b) replaced with other subnetworks.

This substitution may perhaps be subject to a kind of "speciation," in which the probability of

substituting subnetwork A for subnetwork B is roughly proportional to the distance between A

and B in the network.

Preliminary computer simulations indicate that, under appropriate conditions, this sort of

process can indeed converge on efficient programs for executing various perceptual and motor

tasks. However, a complete empirical study of this sort of process remains to be undertaken.

3.2. STRUCTURALLY ASSOCIATIVE MEMORY

So much for the multilevel control network. Let us now turn to long-term memory. What I

call "structurally associative memory" is nothing but a long-term memory model which the

Get any book for free on: www.Abika.com

30

CHAOTIC LOGIC

connections between processes aredetermined not by control structures, nor by any arbitrary

classification system, but by patterned relations .

The idea of associative memory has a long psychological history. Hundreds, perhaps

thousands of experiments on priming indicate that verbal, visual and other types of memory

display associativity of access. For instance, if one has just heard the word "cat," and one is

shown the picture of a dog, one will identify it as a "dog" very quickly. If, on the other hand, one

has just heard the word "car" and one is shown the picture of a dog, identification of the dog as a

"dog" will take a little bit longer.

Associative memory has also proved very useful in AI. What could be more natural than to

suppose that the brain stores related entities near to each other? There are dozens of different

associative memory designs in the engineering and computer science literature. Kohonen's

(1984) associative memory model was one of the landmark achievements of early neural

network theory; and Kanerva's (1988) sparse distributed memory, based on the peculiar statistics

of the Hamming distance, has yielded many striking insights into the nature of recall.

Psychological studies of associative memory tend to deal with words or images, where the

notion of "association" is intuitively obvious. Engineering associative memories use specialized

mathematical definitions of association, based on inner products, bit string comparisons, etc.

Neither of these paradigms seems to have a reasonably general method of defining association,

or "relatedness."

The idea at the core of the structurally associative memory is that relatedness should be

defined in terms of pattern. In the structurally associative memroy, an entity y is connected to

another entity x if x is a pattern in y. Thus, if w and x have common patterns, there will be many

nodes connected to both w and x. In general, if there are many short paths from w to x in the

structurally associative memory, that means that w and x are closely related; that their structures

probably intersect.

On the other hand, if y is a pattern emergent between w and x, y will not necessarily connect

to w or x, but it will connect to the node z = w U x, if there is such a node. One might expect

that, as a rough rule, z would be higher on the multilevel control network than w or z, thus

interconnecting the two networks in a very fundamental way.

The memory of a real person (or computer) can never be truly associative -- sometimes two

dissimilar things will be stored right next to each other, just by mistake. But it can be

approximately structurally associative, and it can continually reorganize itself so as to maintain a

high degree of structural associativity despite a continual influx of new information.

In The Evolving Mind this reorganization is shown to imply that structurally associative

memories evolve by natural selection -- an entity stored in structurally associative memory is

likely to "survive" (not be moved) if it fits in well with (has patterns in common with, generates

emergent pattern cooperatively with, etc.) its environment, with those entities that immediately

surround it.

Get any book for free on: www.Abika.com

31

CHAOTIC LOGIC

3.2.1. The Dynamics of Memory

More specifically, this reorganization must be understood to take place on many different

levels. There is no "memory supervisor" ruling over the entire long term memory store,

mathematically determining the optimal "location" for each entity. So, logically, the only form

which reorganization can take is that of directed, locally governed trial and error.

How might this trial and error work? The most plausible hypothesis, as pointed out in The

Structure of Intelligence, is as follows: one subnetwork is swapped with another; or else

subnetwork A is merely copied into the place of subnetwork B. All else equal, substitution will

tend to take place in those regions where associativity is worse; but there may also be certain

subnetworks that are protected against having their sub-subnetworks removed or replaced.

If the substitution(s) obtained by swapping or copying are successful, in the sense of

improving associativity, then the new networks formed will tend not to be broken up. If the

substitutions are unsuccessful, then more swapping or copying will be done.

Finally, these substitutions may take place in a multilevel manner: large networks may be

moved around, and at the same time the small networks which make them up may be internally

rearranged. The multilevel process will work best if, after a large network is moved, a reasonable

time period is left for its subnetworks to rearrange among themselves and arrive at a "locally

optimal" configuration. This same "waiting" procedure may be applied recursively: after a

subnetwork is moved,it should not be moved again until its sub-subnetworks have had a chance

to adequately rearrange themselves. Note that this reorganization scheme relies on the

existence of certain "barriers." For instance, suppose network A contains network B, which

contains network C. C should have more chance of being moved to a given position inside B

than to a given position out of B. It should have more chance of moving to a given position

inside A-B, than to a given position outside A (here A-B means the portion of A that is not in B).

And so on -- if A is contained in Z, C should have more chance of being moved to a position in

Z-A than outside Z.

In some cases these restrictions may be so strong as to prohibit any rearrangement at all: in

later chapters, this sort of comprehensive rearrangement protection will be identified with the

more familiar concept of reality. In other cases the restrictions may be very weak, allowing the

memory to spontaneously direct itself through a free-floating, never-ending search for perfect

associativity.

In this context, I will discuss the psychological classification of people into thin-boundaried

and thick-boundaried personality types. These types would seem to tie in naturally with the

notion of rearrangement barriers in the structurally associative memory. A thick-boundaried

person tends to have generally stronger rearrangement barriers, and hence tends to reify things

more, to be more resistant to mental change. A thin-boundaried person, on the other hand, has

generally weaker rearrangement barriers, and thus tends to permit even fixed ideas to shift, to

display a weaker grasp on "reality."

Get any book for free on: www.Abika.com

32

CHAOTIC LOGIC

The strength and placement of these "rearrangement barriers" might seem to be a sticky issue.

But the conceptual difficulty is greatly reduced if one assumes that the memory network is

"fractally" structured -- structured in clusters within clusters ... within clusters, or equivalently

networks within networks ... within networks. If this is the case, then one may simply assume

that a certain "degree of restriction" comes along with each cluster, each network of networks of

... networks. Larger clusters, larger networks, have larger degrees of restriction.

The only real question remaining is who assigns this degree. Are there perhaps mental

processes which exist mainly to adjust the degrees of restriction imposed by other processes?

This is a large question, and a complete resolution will have to wait till later. Partof the answer,

however, will be found in the following section, in the concept of the dual network.

3.3. THE DUAL NETWORK

Neither a structurally associative memory nor a multilevel control network can, in itself, lead

to intelligence. What is necessary is to put the two together: to take a single set of

entities/processes, and by drawing a single set of collections between them, structure them both

according to structural associativity and according to multilevel control. This does not mean just

drawing two different graphs on the same set of edges: it means that the same connections must

serve as part of a structurally associative memory and part of a multilevel control network.

Entities which are connected via multilevel control must, on the whole, also be connected via

structural associativity, and vice versa.

A moment's reflection shows that it is not possible to superpose an arbitrary associative

memory structure with a multilevel control hierarchy in this way. In fact, such superposition is

only possible if the entities stored in the associative memory are distributed in an approximately

"fractal" way (Barnsley, 1988; Edgar, 1990).

In a fractally distributed structurally associative memory, on the "smallest" scale, each process

is contained in a densely connected subgraph of "neighbors," each of which is very closely

related to it. On the next highest scale, each such neighborhood is connected to a collection of

"neighboring neighborhoods," so that the elements of a neighboring neighborhood are fairly

closely related to its elements. Such a neighborhood of neighborhoods may be called a 2'nd-level

neighborhood, and in an analogous manner one may define k'th-level neighborhoods. Of course,

this structure need not be strict: may be breaks in it on every level, and each process may appear

at several different vertices.

A good way to understand the fractal structure of the heterarchical network is to think about

the distribution of subjects in a large library. One has disciplines, sub-disciplines, sub-sub-

disciplines, and so forth -- clusters within clusters within clusters, rather than a uniformly

distributed field of subjects. And a good way to visualize the superposition of a hierarchical

network on this structure is to postulate a head librarian dealing with each discipline, an assistant

librarian dealing with each sub-sub-discipline, anassistant assistant librarian dealing with each

sub-sub-sub-discipline, and so on. If one imagines that each librarian, assistant librarian, etc.,

gives her subsidiaries general goals and lets them work out their own strategies, then one has a

Get any book for free on: www.Abika.com

33

CHAOTIC LOGIC

control hierarchy that works approximately according to the multilevel methodology. The

hierarchy of control is lined up perfectly with the fractal heterarchy of conceptual commonality.

A dual network, then, is a collection of processes which are arranged simultaneously in an

hierarchical network and an heterarchical network. Those processes with close parents in the

hierarchical network are, on the whole, correspondingly closely related in the heterarchical

network.

This brings us back to the problem of rearrangement barriers. The rearrangement barriers of

the associative memory network may be set up by the heterarchical network, the multilevel

control network. And, strikingly, in the dual network architecture, substituting of subnetworks

of the memory network is equivalent to genetic optimization of the control network. The same

operation serves two different functions; the quest for associativity and the quest for efficient

control are carried out in exactly the same way. This synergy between structure and dynamics is

immensely satisfying.

But, important and elegant as this is, this is not the only significant interaction between the

two networks. A structurally associative memory is specifically configured so as to support

analogical reasoning. Roughly speaking, analogy works by relating one entity to another entity

with which it shares common patterns, and the structurally associative memory stores an entity

near those entities with which it shares common patterns. And the hierarchical network, the

perceptual-motor hierarchy, requires analogical reasoning in order to do its job. The purpose of

each cluster in the dual network is to instruct its subservient clusters in the way that it estimates

will best fulfill the task given to it by its master cluster -- and this estimation is based on

reasoning analogically with respect to the information stored in its memory bank.

Let's get a little more concrete. The brain is modeled as a dual network of neural networks. It

is considered to consist of "level k clusters" of autonomous neural networks, each one of which

consists of 1) a number of level k-1 clusters, all related to each other, 2) some networks that

monitor and control these level k-1clusters. The degree of control involved here may be highly

variable. However, the neurological evidence shows that entire knowledge bases may be outright

moved from one part of the brain to another (Blakeslee, 1991), so that in some cases the degree

of control is very high.

For example, a level 2 cluster might consist of processes that recognize shapes of various sorts

in visual inputs, together with a network regulating these processes. This cluster of shape

recognition processes would be organized according to the principle of structurally associative

memory, so that e.g. the circle process and the ellipse process would be closer to each other than

to the square process. This organization would permit the regulating process to execute

systematic analogical search for a given shape: if in a given situation the circle process were seen

to be fairly successful, but the square process not at all successful, then the next step would be to

try out those processes near to the circle process.

3.3.1. Precursors of the Dual Network Model

Get any book for free on: www.Abika.com

34

CHAOTIC LOGIC

After hinting at the dual network model in The Structure of Intelligence, and presenting it fully

in The Evolving Mind, I came across two other models of mind which mirror many of its aspects.

First of all, I learned that many cognitive scientists are interested in analyzing thought as a

network of interconnected " schema" (Arbib and Hesse, 1986). This term is not always well

defined -- often a "schema" is nothing more than a process, an algorithm. But Arbib equates

"schema" with Charles S. Peirce's "habit," bringing it very close to the concept of pattern. The

global architecture of this network of schema is not discussed, but the connection is there

nonetheless.

Also, I encountered the paper "Outline for a Theory of Intelligence" by James S. Albus (1991),

Chief of the Robot Systems Division of the National Institute of Standards and Technology. I

was pleased to find therein a model of mind strikingly similar to the dual network, complete with

diagrams such as Figure 6. Albus's focus is rather different than mine -- he is concerned with the

differential equations of control theory rather than the algorithmic structure of reasoning and

memory processes. But the connection between the fractal structure of memory and the

hierarchical structure of control, which is perhaps the most essential component in the dual

network, is virtually implicit in his theory.

Putting the schema theory developed by cognitive scientists together with the global structure

identified by Albus through his robotics work, one comes rather close to a crude version of the

dual network model. This is not how the dual network model was conceived, but it is a rather

satisfying connection. For the dual network structure is, after all, a rather straightforward idea.

What is less obvious, and what has not emerged from cognitive science or engineering, is the

dynamics of the dual network. The way the dual network unifies memory reorganization with

genetic optimization has not previously been discussed; nor has the dynamics of barrier

formation and its relationship with consciousness, language and perception (to be explored in

Chapter Six).

3.4 PREDICTION

The dual network model, as just presented, dismisses the problem of predicting the future

rather cursorily. But this is not entirely justified. Prediction of the behavior of a complex system

is an incredibly very difficult task, and one which lies at the very foundation of intelligence. The

dual network model has no problem incorporating this sort of prediction, but something should

be said about how its prediction processes work, rather than just about how they are

interconnected.

One way to predict the future of a system, given certain assumptions about its present and

about the laws governing its behavior, is to simply simulate the system. But this is inefficient,

for a very simple physicalistic reason. Unlike most contemporary digital computers, the brain

works in parallel -- there are a hundred billion neurons working at once, plus an unknown

multitude of chemical reactions interacting with and underlying this neural behavior. And each

neuron is a fairly complicated biochemical system, a far cry from the on-off switch in a digital

computer. But when one simulates a system, one goes one step at a time . To a certain extent,

this wastes the massive parallelism of the brain.

Get any book for free on: www.Abika.com

35

CHAOTIC LOGIC

So, the question is, is simulation the best a mind can do, or are there short-cuts? This question

ties in with some pressing problems of modern mathematics and theoretical computer science.

One of the biggest trends in modern practical computer science is the development of parallel-

processing computers, and it is of great interest to know when these computers can outperform

conventional serial computers, and by what margin.

3.4.1. Discrete Logarithms (*)

For a simple mathematical example, let us look to the theory of finite fields. A finite field is a

way of doing arithmetic on a bounded set of integers. For instance, suppose one takes the field of

size 13 (the size must be a prime or a prime raised to some power). Then, in this field the largest

number is 12. One has, for example, 12 + 1 = 0, 10 + 5 = 2, 3 x 5 = 2, and 8 x 3 = 12. One can do

division in a finite field as well, although the results are often counterintuitive -- for instance,

12/8 = 3, and 2/3 = 5 (to see why, just multiply both sides by the denominator).

In finite field theory there is something called the "discrete logarithm" of a number, written

dlogb(n). The discrete logarithm is defined just like the ordinary logarithm, as the inverse of

exponentiation. But in a finite field, exponentiation must be defined in terms of the "wrap-

around" arithmetic illustrated in the previous paragraph. For instance, in the field of size 7, 34 =

4. Thus one has dlog3(4) = 4. But how could one compute the log base 3 of 4, without knowing

what it was? The powers of 3 can wrap around the value 7 again and again -- they could wrap

around many times before hitting on the correct value, 4.

The problem of finding the discrete logarithm of a number is theoretically easy, in the sense

that there are only finitely many possibilities. In our simple example, all one has to do is take 3

to higher and higher powers, until all possibilities are covered. But in practice, if the size of the

field is not 7 but some much larger number, this finite number of possibilities can become

prohibitively large.

So, what if one defines the dynamical system nk = dlogb(nk-1)? Suppose one is given n1, then

how can one predict n1000? So far as we know today, there is better way than to proceed in order:

first get n2, then n3, then n4, and so on up to n999 and n1000. Working on n3 before one knows n2 is

essentially useless, because a slight change in the answer for n2 can totally chagne the answer for

n3. The only way to do all 1000 steps in parallel, it seems, would be to first compute a table of all

possible powers that one might possibly need to know in the course of calculation. But this

would require an immense number of processors; at least the square of the size of the field.

This example is, incidentally, of more than academic interest. Many cryptosystems in current

use are reliant on discrete logarithms. If one could devise a quickmethod for computing them,

one could crack all manner of codes; and the coding theorists would have to come up with

something better.

3.4.2. Chaos and Prediction

More physicalistic dynamical systems appear to have the same behavior. The classic example

is the "logistic" iteration xk = cxk-1(1-xk-1), where c=4 or c assumes certain values between 3.8

Get any book for free on: www.Abika.com

36

CHAOTIC LOGIC

and 4, and the xk are discrete approximations of real numbers. This equation models the

dynamics of certain biological populations, and it also approximates the equations of fluid

dynamics under certain conditions.

It seems very, very likely that there is no way to compute xn from x1 on an ordinary serial

computer, except to proceed one step at a time. Even if one adds a dozen or a thousand or a

million processors, the same conclusion seems to hold. Only if one adds a number of processors

roughly proportional to 2n can one obtain a significant advantage from parallelism.

In general, all systems of equations called chaotic possess similar properties. These include

equations modeling the weather, the flow of blood through the body, the motions of planets in

solar systems, and the flow of electricity in the brain. The mathematics of these systems is still in

a phase of rapid development. But the intuitive picture is clear. To figure out what the weather

will be ninety days from now, one must run an incredibly accurate day-by-day simulation -- even

with highly parallel processing, there is no viable alternate strategy.

3.4.3. Chaos, Prediction and Intelligence

A mind is the structure of an intelligent system, and intelligence relies on prediction, memory

and optimization. Given the assumption that some past patterns will persist, a mind must always

explore several different hypotheses as to which ones will persist. It must explore several

different possible futures, by a process of predictive extrapolation. Therefore, intelligence

requires the prediction of the future behavior of partially unpredictable systems.

If these systems were as chaotic as xk = 4xk(1-xk), all hope would be lost. But the weather

system is a better example. It is chaotic in its particular details -- there is no practical way, today

in 1992, to determine the temperature on July 4 1999 in Las Vegas. But there are certain

persistent patterns that allow one to predict its behavior in a qualitative way. After all, the

temperature on July 4 1999 in Las Vegas will probably be around 95-110 Fahrenheit. One can

make probabilistic, approximate predictions -- one can recognize patterns in the past and

hope/assume that they will continue.

Our definition of intelligence conceals the presupposition that most of the prediction which

the mind has to do is analogous to this trivial weather prediction example. No step-by-step

simulation is required, only inductive/analogical reasoning, supported by memory search.

However, the fact remains that sometimes the mind will run across obstinate situations --

prediction problemss that are not effectively tackled using intuitive memory or using parallel-

processing shortcuts. In these cases, the mind has no choice but to resort to direct simulation (on

some level of abstraction).

The brain is a massively parallel processor. But when it runs a direct simulation of some

process, it is acting like a serial processor. In computerese, it is running a virtual serial

machine. The idea that the parallel brain runs virtual serial machines is not a new one -- in

Consciousness Explained Daniel Dennett proposes that consciousness is a virtual serial machine

run on the parallel processor of the brain. As will be seen in Chapter Six, although I cannot

accept Dennett's reductionist analysis of consciousness, I find a great deal of merit in this idea.

Get any book for free on: www.Abika.com

37

CHAOTIC LOGIC

3.5. STRUCTURED TRANSFORMATION SYSTEMS

To proceed further with my formal theory of intelligence, I must now introduce some slightly

technical definitions. The concept of a structured transformation system will be absolutely

essential to the theory of language and belief to be given in later chapters. But before I can say

what a structured transformation system is, I must define a plain old transformation system.

In words, a transformation system consists of a set I of initials, combined with a set of T

transformation rules. The initials are the "given information"; the transformation rules are

methods for combining andaltering the initials into new statements. The deductive system itself,

I will call D(I,T).

For instance, in elementary algebra one has transformation rules such as

X = Y implies X+Z = Y+Z, and XZ = YZ

(X + Y) + Z = X + (Y+Z)

X- X=0

X+0=X

X+Y=Y+X

If one is given the initial

2q - r = 1

one can use these transformation rules to obtain

q = (1 + r)/2.

The latter formula has the same content as the initial, but its form is different.

If one had a table of numbers, say

rq

1 1

2 3/2

3 2

4 5/2

Get any book for free on: www.Abika.com

38

CHAOTIC LOGIC

5 3

...

99 50

then the "q=(1+r)/2" would be a slightly more intense pattern in one's table than "2q+r=1." For

the work involved in computing the table from "2q+r=1" is a little greater -- one must solve for q

each time r is plugged in, or else transform the equation into "q=(1+r)/2."

Thus, although in a sense transformation systems add no content to their initials, they are

capable of producing new patterns. For a list of length 100, as given above, both are clearly

patterns. But what if the list were of length 4? Then perhaps "2q + r=1" would not be a pattern:

the trouble involved in using it might be judged to exceed the difficulty of using the list itself.

But perhaps q = (1+r)/2 would still be a pattern. It all depends on who's doing the judging of

complexities -- but for any judge there is likely to be some list length for which one formula is a

pattern and the other is not.

This is, of course, a trivial example. A better example is Kepler's observation that planets

move in ellipses. This is a nice compact statement, which can be logically derived from Newton's

Three Laws of Motion. But the derivation is fairly lengthy and time-consuming. So if one has a

brief list of data regarding planetary position, it is quite possible that Kepler's observation will be

a significant pattern, but Newton's Three Laws will not. What is involved here is the complexity

of producing x from the process y. If this complexity is too great, then no matter how simple

the process y, y will not be a pattern in x.

3.5.1. Transformation Systems (*)

In this section I will give a brief formal treatment of "transformation systems." Let W be any

set, let A be a subset of W, called the set of "expressions"; and let I = {W1, W2, ..., Wn} be a

subset of W, called the set of initials. Let W* denote the set {W,WxW,WxWxW,...). And let T =

{F1,F2,...,Fn} be a set of transformations; that is, a set of functions each of which maps some

elements of W* into elements of A. For instance, if W were a set of propositions, one might have

F1(x,y)= x and y, and F2(x) = not x.

Let us now define the set D(I,T) of all elements of S which are derivable from the

assumptions I via the transformations T. First of all, it is clear that I should be a subset of D(I,T).

Let us call the elements of I the depth-zero elements of D(I,T). Next, what about elements of the

form x = Fi(A1,...,Am), for some i, where each Ak=Ij for some j? Obviously, these elements are

simple transformations of the assumptions; they should be elements of D(I,T) as well. Let us call

these the depth-one elements of D(I,T). Similarly, one may define an element x of S to be a

depth-n element of D(I,T) if x=Fi(A1,...,Am), for some i, where each of the Ak is a depth-p

element of D(I,T), for some p<n. Finally, D(I,T) may then be defined as the set of all x which are

depth-n elements of D(I,T) for some n.

Get any book for free on: www.Abika.com

39

CHAOTIC LOGIC

For example, if the T are rules of logic and the I are some propositions about the world, then

D(I,T) is the set of all propositions which are logically equivalent to some subset of I. In this case

deduction is a matter of finding the logical consequences of I, which are presumably a small

subset of the total set S of all propositions. This is the general form of deduction. Boolean logic

consists of a specific choice of T; andpredicate calculus consists of an addition onto the set T

provided by Boolean logic.

It is worth noting that, in this approach to deduction, truth is inessential. In formal logic it is

conventional to assume that one's assumptions are "true" and one's transformations are "truth-

preserving." However, this is just an interpretation foisted on the deductive system after the fact.

3.5.2. Analogical Structure

The set (I,T) constructed above might be called a transformation system. It may be likened

to a workshop. The initials I are the materials at hand, and the transformations T are the tools.

D(I,T) is the set of all things that can be built, using the tools, from the materials.

What is lacking? First of all, blueprints. In order to apply a transformation system to real

problem, one must have some idea of which transformations should be applied in which

situations.

But if an intelligence is going to apply a transformation system, it will need to apply it in a

variety of different contexts. It will not know exactly which contexts are going to arise in future.

It cannot retain a stack of blueprints for every possible contingency. What it needs is not merely

a stack of blueprints, but a mechanism for generating blueprints to fit situations.

But, of course, it already has such a mechanism -- its innate intelligence, its ability to induce,

to reason by analogy, to search through its associative memory. What intelligence needs is a

transformation system structured in such a way that ordinary mental processes can serve as its

blueprint-generating machine.

In SI this sort of transformation system is called a "useful deductive system." Here, however, I

am thinking more generally, and I will use the phrase structured transformation system

instead. A structured transformation system is a transformation system with the property that, if a

mind wants to make a "blueprint" telling it how to construct something from the initials using

the transformations, it can often approximately do so by reasoning analogically with respect to

the blueprints from other construction projects.

Another way to put it is: a structured transformation system, or STS, is transformation

system with the property that the proximity between x and y in an ideal structurally associative

memory is correlatedwith the similarity between the blueprint sets corresponding to x and y. A

transformation system is structured if the analogically reasoning mind can use it, in practice, to

construct things to order. This construction need not be infallible -- it is required only that it

work approximately, much of the time.

3.5.2.1. (*) A Formal Definition

Get any book for free on: www.Abika.com

40

CHAOTIC LOGIC

One formal definition goes as follows. Let x and y be two elements of D(I,T), and let GI,T(x)

and GI,T(y) denote the set of all proofs in the system (I,T) of x and y respectively. Let U equal the

minimum over all functions v of the sum a|v| + B, where B is the average, over all pairs (x,y) so

that x and y are both in D(I,T), of the correlation coefficient between

d#[St(x union v)-St(v), St(y union v) - St(v)]

and

d*[GI,T(x),GI,T(y)].

Then (I,T) is structured to degree U.

Here d#(A,B) is the structural complexity of the symmetric difference of A and B. And d* is a

metric on the space of "set of blueprints," so that the d*[GI,T(x),GI,T(y)] denotes of the distance

between the set of proofs of x and the set of proofs of y.

If the function v were omitted, then the degree of structuredness of U would be a measure of

how true it is that structurally similar constructions have similar blueprint sets. But the inclusion

of the function v broadens the definition. It need not be the case that similar x and y have similar

blueprint sets. If x and y display similar emergent patterns on conjunction with some entity v,

and x and y have similar blueprint sets, then this counts as structuredness too.

3.5.3. Transformation, Prediction and Deduction

What do STS's have to do with prediction? To make this connection, it suffices to interpret the

depth index of an element of D(I,T) as a time index. In other words, one may assume that to

apply each transformation in T takes some integer number of "time steps," and consider the

construction of an element in D(I,T) as a process of actual temporal construction. This is a

natural extension of the "materials, tools and blueprints" metaphor introduced above.

A simulation of some process, then, begins with an initial condition (an element of I) and

proceeds to apply dynamical rules (elements of T), one after the other. In the case of a simple

iteration like xk = cxk-1(1-xk-1), the initial condition is an approximation of a real number, and

there is only one transformation involved, namely the function f(x) = cx(1-x) or some

approximation thereof. But in more complex simulations there may be a variety of different

transformations.

For instance, a numerical iteration of the form xk = f(k,xk-1) rather than xk = f(xk-1) requires a

different iteration at each time step. This is precisely the kind of iteration used to generate

fractals by the iterated function system method (Barnsley, 1988). In this context, oddly enough, a

random or chaotic choice of k leads to a more intricately structured trajectory than an orderly

choice of k.

So, the process of simulating a dynamical system and the process of making a logical

deduction are, on the broadest level, the same. They both involve transformation systems. But

Get any book for free on: www.Abika.com

41

CHAOTIC LOGIC

what about the structured part? What would it mean for a family of simulations to be executed

according to a structured transformation system?

It would mean, quite simply, that the class of dynamical rule sequences that lead up to a

situation is correlated with the structure of the situation. With logical deduction, one often

knows what one wants to prove, and has to find out how to prove it -- so it is useful to know

what worked to prove similar results. But with simulation, it is exactly the reverse. One often

wants to know what the steps in one's transformation sequence will lead to, because one would

like to avoid running the whole transformation sequence through, one step at a time. So it is

useful to know what has resulted from running through similar transformation sequences. The

same correlation is useful for simulation as for deduction -- but for a different reason.

Actually, this is an overstatement. Simulation makes some use of reasoning from similarity of

results to similarity transformation sequences -- because one may be able to guess what the

results of a certain transformation sequence will be, and then one will want to know what similar

transformation sequences have led to, in order to assess the plausibility of one's guess. And

deduction makes some use of reasoning from similarity of transformation sequences to similarity

of results -- one may have an idea for a "proof strategy," and use analogical reasoning to make a

guess at whether this strategy will lead to anything interesting. There is adistinction between the

two processes, but it is not precisely drawn.

In conclusion, I propose that most psychological simulation and deduction is done by

structured transformation systems. Some short simulations and deductions may be done without

the aid of structure -- but this is the exception that proves the rule. Long chains of deductive

transformations cannot randomly produce useful results. And long chains of dynamical

iterations, if unmonitored by "common sense", are likely to produce errors -- this is true even of

digital computer simulations, which are much more meticulous than any program the human

brain has ever been known to run.

Psychologically, structured transformation systems are only effective if run in parallel.

Running one transformation after another is very slow. Some simulations, and some logical

deductions, will require this. But the mind will do its utmost to avoid it. One demonstration of

this is the extreme difficulty of doing long mathematical proofs in one's head. Even the greatest

mathematicians used pencil and paper, to record the details of the last five steps while they filled

up their minds with the details of the next five.

Chapter Four

PSYCHOLOGY AND LOGIC

I have already talked a little about deduction and its role in the mind. In this chapter, however,

I will develop this theme much more fully. The relation between psychology and logic is

important, not only because of the central role of deductive logic in human thought, but also

because it is a microcosm of the relation between language and thought in general. Logic is an

Get any book for free on: www.Abika.com

42

CHAOTIC LOGIC

example of a linguistic system, and it reveals certain phenomena that are obscured by the sheer

complexity of other linguistic systems.

4.1. PSYCHOLOGISM AND LOGISM

Today, as John MacNamara has put it, "logicians and psychologists generally behave like the

men and women in an orthodox synagogue. Each group knows about the other, but it is proper

form that each should ignore the other" (1986, p.1). But such was not always the case. Until

somewhere toward the end of nineteenth century, the two fields of logic and psychology were

closely tied together. What changed things was, on the one hand, the emergence of experimental

psychology; and, on the other hand, the rediscovery and development of elementary symbolic

logic by Boole, deMorgan and others.

The early experimental psychologists purposely avoided explaining intelligence in terms of

logic. Mental phenomena were analyzed in terms of images, associations, sensations, and so

forth. And on the other hand -- notwithstanding the psychological pretensions of Leibniz's early

logical investigations and Boole's Laws of Thought -- the early logicians moved further and

further each decade toward considering logical operationsas distinct from psychological

operations. It was increasingly realized on both sides that the formulas of propositional logic

have little connection with emotional, intuitive, ordinary everyday thought.

Of course, no one denies that there is some relation between psychology and logic. After all,

logical reasoning takes place within the mind. The question is whether mathematical logic is a

very special kind of mental process, or whether, on the other hand, it is closely connected with

everyday thought processes. And, beginning around a century ago, both logicians and

psychologists have overwhelmingly voted for the former answer.

The almost complete dissociation of logic and psychology which one finds today may be

partly understood as a reaction against the nineteenth-century doctrines of psychologism and

logism. Both of these doctrines represent extreme views: logism states that psychology is a

subset of logic; and psychologism states that logic is a subset of psychology.

Boole's attitude was explicitly logist -- he optimistically suggested that the algebraic equations

of his logic corresponded to the structure of human thought. Leibniz, who anticipated many of

Boole's discoveries by approximately two centuries, was ambitious beyond the point of logism as

I have defined it here: he felt that elementary symbolic logic would ultimately explain not only

the mind but the physical world. And logism was also not unknown among psychologists -- it

was common, for example, among members of the early Wurzburg school of Denkpsychologie.

These theorists felt that human judgements generally followed the forms of rudimentary

mathematical logic.

But although logism played a significant part in history, the role of psychologism was by far

the greater. Perhaps the most extreme psychologism was that of John Stuart Mill (1843), who in

his System of Logic argued that

Get any book for free on: www.Abika.com

43

CHAOTIC LOGIC

Logic is not a Science distinct from, and coordinate with, Psychology. So far as it is a Science at

all, it is a part or branch of Psychology.... Its theoretic grounds are wholly borrowed from

Psychology....

Mill understood the axioms of logic as "generalizations from experience." For instance, he gave

the following psychological "demonstration" of the Law of ExcludedMiddle (which states that

for any p, either p or not-p is always true):

The law on Excluded Middle, then, is simply a generalization of the universal experience that

some mental states are destructive of other states. It formulates a certain absolutely constant law,

that the appearance of any positive mode of consciousness cannot occur without excluding a

correlative negative mode; and that the negative mode cannot occur without excluding the

correlative positive mode.... Hence it follows that if consciousness is not in one of the two modes

it much be in the other (bk. 2, chap.,7, sec.5)

Even if one accepted psychologism as a general principle, it is hard to see how one could take

"demonstrations" of this nature seriously. Of course each "mode of consciousness" or state of

mind excludes certain others, but there is no intuitively experienced exact opposite to each state

of mind. The concept of logical negation is not a "generalization" of but rather a specialization

and falsification of the common psychic experience which Mill describes. The leap from

exclusion to exact opposition is far from obvious and was a major step in the development of

mathematical logic.

As we will see a little later, Nietzsche (1888/1968) also attempted to trace the rules of logic to

their psychological roots. But Nietzsche took a totally different approach: he viewed logic as a

special system devised by man for certain purposes, rather than as something wholly deducible

from inherent properties of mind. Mill was convinced that logic must follow automatically from

"simpler" aspects of mentality, and this belief led him into psychological absurdities.

The early mathematical logicians, particularly Gottlob Frege, attacked Mill with a vengeance.

For Frege (1884/1952) the key point was the question: what makes a sentence true? Mill, as an

empiricist, believed that all knowledge must be derived from sensory experience. But Frege

countered that "this account makes everything subjective, and if we follow it through to the end,

does away with truth" (1959, p. vii). He proposed that truth must be given a non-psychological

definition, one independent of the dynamics of any particular mind. This Fregean conception of

truth received its fullestexpression in Tarski's (1935) and Montague's (1974) work on formal

semantics, to be discussed in Chapter Five.

To someone acquainted with formal logic only in its recent manifestations, the very concept of

psychologism is likely to seem absurd. But the truth is that, before the work of Boole, Frege,

Peano, Russell and so forth transformed logic into an intensely mathematical discipline, the

operations of logic did have direct psychological relevance. Aristotle's syllogisms made good

psychological sense (although we now know that much useful human reasoning relies on

inferences which Aristotle deemed incorrect). The simple propositional logic of Leibniz and

Boole could be illustrated by means of psychological examples. But the whole development of

modern mathematical logic was based on the introduction of patently non-psychological axioms

Get any book for free on: www.Abika.com

44

CHAOTIC LOGIC

and operations. Today few logicians give psychology a second thought, but for Frege it was a

major conceptual battle to free mathematical logic from psychologism.

In sum, psychologists ignored those few voices which insisted on associating everyday mental

processes with mathematical logic. And, on the other hand, logicians actively rebelled against the

idea that the rules of mathematical logic must relate to rules of mental process. Psychology

benefited from avoiding logism, and logic gained greatly from repudiating psychologism.

4.1.1. The Rebirth of Logism

But, of course, that wasn't the end of the story. Although contemporary psychology and logic

have few direct relations with one another, in the century since Frege there has arisen a brand

new discipline, one that attempts to bring psychology and logic closer together than they ever

have been before. I am speaking, of course, about artificial intelligence.

Early AI theorists -- in the sixties and early seventies -- brought back logism with a

vengeance. The techniques of early AI were little more than applied Boolean logic and tree

search, with a pinch or two of predicate calculus, probability theory and other mathematical

tricks thrown in for good measure. But every few years someone optimistically predicted that an

intelligent computer was just around the corner. At this stage AI theorists basically ignored

psychology -- they felt that deductive logic, and deductive logic alone, was sufficient for

understanding mental process.

But by the eighties, AI was humbled by experience. Despite some incredible successes,

nothing anywhere neara "thinking machine" has been produced. No longer are AI theorists too

proud to look to psychology or even philosophy for assistance. Computer science still relies

heavily on formal logic -- not only Boolean logic but more recent innovations such as model

theory and non-well-founded sets (Aczel, 1988) -- and AI is no exception. But more and more AI

theorists are wondering now if modern logic is adequate for their needs. Many, dissatisfied with

logism, are seeking to modify and augment mathematical logic in ways that bring it closer to

human reasoning processes. In essence, they are augmenting their vehement logism with small

quantities of the psychologism which Frege so abhorred.

4.1.2. The Rebirth of Ps ychologism

This return to a limited psychologism is at the root of a host of recent developments in several

different areas of theoretical AI. Perhaps the best example is nonmonotonic logic, which has

received a surprising amount of attention in recent years. But let us dwell, instead, on an area of

research with more direct relevance to the present book: automated theorem proving.

Automatic theorem proving -- the science of programming computers to prove mathematical

theorems -- was once thought of as a stronghold of pure deductive logic. It seemed so simple:

just apply the rules of mathematical logic to the axioms, and you generate theorems. But now

many researchers in automated theorem proving have realized that this is only a very small part

of what mathematicians do when they prove theorems. Even in this ethereal realm of reasoning,

tailor-made for logical deduction, nondeductive, alogical processes are of equal importance.

Get any book for free on: www.Abika.com

45

CHAOTIC LOGIC

For example, after many years of productive research on automated theorem proving, Alan

Bundy (1991) has come to the conclusion that

Logic is not enough to understand reasoning. It provides only a low-level, step by step

understanding, whereas a high-level, strategic understanding is also required. (p. 178)

Bundy proposes that one can program a computer to demonstrate high-level understanding of

mathematical proofs, by supplying it with the ability to manipulate entities called proof plans.

A proof plan is defined as a common structure that underlies and helps to generate many

differentmathematical proofs. Proof plans are not formulated based on mathematical logic alone,

they are rather

refined to improve their expectancy, generality, prescriptiveness, simplicity, efficiency and

parsimony while retaining their correctness. Scientific judgement is used to find a balance

between these sometimes opposing criteria. (p.197)

In other words, proof plans, which control and are directed by deductive theorem-proving, are

constructed and refined by illogical or alogical means.

Bundy's research programme -- to create a formal, computational theory of proof plans -- is

about as blatant as pychologism gets. In fact, Bundy admits that he has ceased to think of himself

as a researcher in automated theorem proving, and come to conceive of himself as a sort of

abstract psychologist:

For many years I have regarded myself as a researcher in automatic theorem proving. However,

by analyzing the methodology I have pursued in practice, I now realize that my real motivation is

the building of a science of reasoning.... Our science of reasoning is normative, empirical and

reflective. In these respects it resembles other human sciences like linguistics and Logic. Indeed

it includes parts of Logic as a sub-science. (p. 197)

How similar this is, on the surface at least, to Mill's "Logic is ... a part or branch of Psychology"!

But the difference, on a deeper level, is quite large. Bundy takes what I would call a Nietzschean

rather than a Millean approach. He is not deriving the laws of logic from deeper psychological

laws, but rather studying how the powerful, specialized reasoning tool that we call "deductive

logic" fits into the general pattern of human reasoning.

4.2. LIMITED BOOLEAN LOGISM

Bundy defends what I would call a "limited Boolean logism." He maintains that Boolean logic

and related deductive methods are an important part of mental process, but that they are

supplemented by and continually affected by other mental processes. At first sight, this

perspective seems completely unproblematic. We think logically when we need to, alogically

when we need to; and sometimes the two modes of cognition will interact. Very sensible.

Get any book for free on: www.Abika.com

46

CHAOTIC LOGIC

But, as everyone who has taken a semester of university logic is well aware, things are not so

simple. Even limited Boolean logism has its troubles. I am speaking about the simple conceptual

conundrums of Boolean logic, such as Hempel's paradox of confirmation and the paradoxes of

implication. These elementary "paradoxes," though so simple that one could explain them to a

child, are obstacles that stand in the way of even the most unambitious Boolean logism. They

cast doubt as to whether Boolean logic can ever be of any psychological relevance whatsoever.

4.2.1. Boolean Logic and Modern Logic

One might well wonder, why all this emphasis on Boolean logic. After all, from the logician's

point of view, Boolean logic -- the logic of "and", "or" and "not" -- is more than a bit out-of-date.

It does not even include quantification, which was invented by Peirce before the turn of the

century. Computer circuits are based entirely on Boolean logic; however, modern mathematical

logic has progressed as far beyond Leibniz, Boole and deMorgan as modern biology has

progressed beyond Cuvier, von Baer and Darwin.

But still, it is not as though modern logical systems have shed Boolean logic. In one way or

another, they are invariably based on Boolean ideas. Mathematically, nearly all logical systems

are "Boolean algebras" -- in addition to possessing other, subtler structures. And, until very

recently, one would have been hard put to name a logistic model of human reasoning that did not

depend on Boolean logic in a very direct way. I have already mentioned two exceptions,

nonmonotonic logic and proof plans, but these are recent innovations and still in very early

stages of development.

So the paradoxes of Boolean logic are paradoxes of modern mathematical logic in general.

They are the most powerful weapon in the arsenal of the contemporary anti-logist. Therefore, the

most sensible way to begin our quest to synthesize psychology and logic is to dispense with these

paradoxes.

Paradoxes of this nature cannot be "solved." They are too simple for that, too devastatingly

fundamental. So my aim here is not to "solve" them, but rather to demonstrate that they are

largely irrelevant to theproject of limited Boolean logism -- if this project is carried out in the

proper way. This demonstration is less logical than psychological. I will assume that the mind

works by pattern recognition and multilevel optimization, and show that in this context Boolean

logic can control mental processes without succumbing to the troubles predicted by the

paradoxes.

4.2.2. The Paradoxes of Boolean Logic

Before going any further, let us be more precise about exactly what these "obstacles" are. I

will deal with four classic "paradoxes" of Boolean logic:

1. The first paradox of implication. According to the standard definition of implication one

has "a --> (b --> a)" for all a and b. Every true statement is implied by anything whatsoever. For

instance, the statement that the moon is made of green cheese implies the statement that one plus

one equals two. The statement that Lassie is a dog implies the statement that Ione Skye is an

Get any book for free on: www.Abika.com

47

CHAOTIC LOGIC

actress. This "paradox" follows naturally from the elegant classical definition of "a --> b" as

"either b, or else not a". But it renders the concept of implication inadequate for many purposes.

2. The second paradox of implication. For all a and c, one has "not-c --> (c --> a)". That is,

if c is false, then c implies anything whatsoever. From the statement that George Bush has red

hair, it follows that psychokinesis is real.

3. Contradiction sensitivity. In the second paradox of implication, set c equal to the

conjunction of some proposition and its opposite. Then one has the theorem that, if "A and not-

A" is true for any A, everything else is also true . This means that Boolean logic is incapable of

dealing with sets of data that contain even one contradiction. For instance, assume that "I love

my mother", and "I do not love my mother" are both true. Then one may prove that 2+2=5. For

surely "I love my mother" implies "I love my mother or 2+2=5" (in general, "a --> (a or b) ).

But, just as surely, "I do not love my mother" and "I love my mother or 2+2=5", taken together,

imply "2+2=5" (in general, [a and (not-a or b)] --> b). Boolean logic is a model of reasoning in

which ambivalence about one's feelings for one's mother leads naturally to the conclusion that

2+2=5.

4. Hempel's confirmation paradox. According to Boolean logic, "all ravens are black" is

equivalent to "all nonblack entities are nonravens". That is,schematically, "(raven --> black) -->

(not-black --> not-raven)". This is a straightforward consequence of the standard definition of

implication. But is it not the case that, if A and B are equivalent hypotheses, evidence in favor of

B is evidence in favor of A. It follows that every observation of something which is not black

and also not a raven is evidence that ravens are black. This is patently absurd.

4.2.3. The Need for New Fundamental Notions

The standard method for dealing with these paradoxes has to acknowledge them, then dismiss

them as irrelevant. In recent years, however, this evasive tactic has grown less common. There

have been several attempts to modify standard Boolean-based formal logic in such a way as to

avoid these difficulties: relevant logics (Read, 1988), paraconsistent logics (daCosta, 1984), and

so forth.

Some of this work is of very high quality. But in a deeper conceptual sense, none of it is really

satisfactory. It is, unfortunately, not concrete enough to satisfy even the most logistically inclined

psychologist. There is a tremendous difference between a convoluted, abstract system jury-

rigged specifically to avoid certain formal problems, and a system with a simple intuitive logic

behind it.

An interesting commentary on this issue is provided by the following dialogue, reported by

Gian-Carlo Rota (1985). The great mathematician Stanislaw Ulam was preaching to Rota about

the importance of subjectivity and context in understanding meaning. Rota begged to differ (at

least partly in jest):

Get any book for free on: www.Abika.com

48

CHAOTIC LOGIC

"But if what you say is right, what becomes of objectivity, an idea that is so definitively

formulated by mathematical logic and the theory of sets, on which you yourself have worked for

many years of your youth?"

Ulam answered with "visible emotion":

"Really? What makes you think that mathematical logic corresponds to the way we think? You

are suffering from what the French call a deformation professionelle. ..."

"Do you then propose that we give up mathematical logic?" said I, in fake amazement.

"Quite the opposite. Logic formalizes only very few of the processes by which we

actuallythink. The time has come to enrich formal logic by adding to it some other fundamental

notions. ... Do not lose your faith," concluded Stan. "A mighty fortress is mathematics. It will

rise to the challenge. It always has."

Ulam speaks of enriching formal logic "by adding to it some other fundamental notions."

More specifically, I suggest that we must enrich formal logic by adding to it the fundamental

notions of pattern and multilevel control, as discussed above. The remainder of this chapter is

devoted to explaining how, if one views logic in the context of pattern and multilevel control, all

four of the "paradoxes" listed above are either resolved or avoided.

This explanation clears the path for a certain form of limited Boolean logism -- a Boolean

logism that assigns at least a co-starring role to pattern and multilevel control. And indeed, in the

chapters to follow I will develop such a form of Boolean limited logism, by extending the

analysis of logic given in this chapter to more complex psychological systems: language and

belief systems.

4.3. THE PARADOXES OF IMPLICATION

Let us begin with the first paradox of implication. How is it that a true statement is implied by

everything?

This is not our intuitive notion of consequence. Suppose one mental process has a dozen

subsidiary mental processes, supplies them all withstatement A, and asks each of them to tell it

what follows from A. What if one of these subsidiary processes responds by outputting true

statements at random? Justified, according to Boolean logic -- but useless! The process should

not survive. What the controlling process needs to know is what one can use statement A for --

to know what follows from statement A in the sense that statement A is an integral part of its

demonstration.

This is a new interpretation of "implies." In this view, "A implies B" does not mean simply "-

B + A", it means that A is an integral part of a natural reasoning process leading towards B. It

means that A is helpful in arriving at B. Intuitively, it means that, when one sees that someone

has arrived at the conclusion B, it is plausible to assume that they arrived at A first andproceeded

Get any book for free on: www.Abika.com

49

CHAOTIC LOGIC

to B from there. If one looks at implication this way -- structurally, algorithmically,

informationally -- then the paradoxes are gone.

In other words, according to the informational definition, A significantly implies B if it is

sensible to use A to get B. The mathematical properties of this definition have yet to be

thoroughly explored. However, it is clear that a true statement is no longer significantly implied

by everything: the first paradox of implication is gone.

And the second paradox of implication has also disappeared. A false statement no longer

implies everything, because the generic proof of B from "A and not-A" makes no essential use of

A; A could be replaced by anything whatsoever.

4.3.1. Informational Implication (*)

In common argument, when one says that one thing implies another, one means that, by a

series of logical reasonings, one can obtain the second thing from the first. But one does not

mean to include series of logical reasonings which make only inessential use of the first thing.

One means that, using the first thing in some substantial way, one may obtain the second through

logical reasoning. The question is, then, what does use mean?

If one considers only formulas involving --> (implication) and - (negation), it is possible to

say something interesting about this in a purely formal way. Let B1,...,Bn be a proof of B in the

deductive system T union {A}, where T is some theory. Then, one might define A to be used in

deriving Bi if either

1) Bi is identical with A, or

2) Bi is obtained, through an application of one of the rules of inference, from Bj's with j<i,

and A is used for deriving at least one of these Bj's.

But this simplistic approach becomes hopelessly confused when disjunction or conjunction

enters into the picture. And even in this uselessly simple case, it has certain conceptual

shortcomings. What if there is a virtually identical proof of A which makes no use of A? Then is

it not reasonable to say that the supposed "use" of A is largely, though not entirely, spurious?

It is not inconceivable that a reasonable approximation of the concept of use might be captured

by some complex manipulation of connectives. However, I contend that what use really has to

do with is structure. Talking about structure is not so cut-and-dried astalking about logical form

-- one always has a lot of loose parameters. But it makes much more intuitive sense.

Let GI,T,v(B) denote the set of all valid proofs of B, relative to some fixed "deductive system"

(I,T), of complexity less than v. An element of GI,T,v is a sequence of steps B0,B1,...,Bn-1, where

Bn=B, and for k>0 Bk follows from Bk-1 by one of the transformation rules T. Where Z is an

element of GI,T,v(B), let L(Z) = |B|/|Z|. This is a measure of how much it simplifies B to prove it

via Z.

Get any book for free on: www.Abika.com

50

CHAOTIC LOGIC

Where GI,T,v(B) = {Z1,...,ZN}, and p is a positive integer, let

A = L(Z1)*[I(Z1|Y)]1/p + L(Z2)*[I(Z2|Y)]1/p + ... + L(ZN)*[I(ZN|Y)]1/p

B = I(Z1|Y)]1/p + I(Z2|Y)]1/p + ... + [I(ZN|Y)]1/p

Qp,v = A/B

Note that, since I(Zi|Y) is always a positive integer, as p tends to infinity, Qp,v tends toward the

value L(Z)*I(Z|Y), where Z is the element of GI,T,v that minimizes I(Z|N). The smaller p is, the

more fairly the value L(Z) corresponding to every element of GI,T,v is counted. The larger p is,

the more attention is focused on those proofs that are informationally close to Y. The idea is that

those proofs which are closer to Y should count much more than those which are not.

Definition: Let | | be a complexity measure (i.e., a nonnegative-real-valued function). Let

(I,T) be a deductive system, let p be a positive integer, and let 0<c<1. Then, relative to | |, (I,T), p

and c, we will say A significantly implies B to degree K, and write

A -->K B

if K = cL+(1-c)M is the largest of all numbers such that for some v there exists an element Y of

GI,T,v so that

1) A=B0 (in the sequence of deductions described by Y)

2) L = L(Y) = |B|/[|Y|],

3) M = 1/Qp,|Y|

According to this definition, A significantly implies B to a high degree if and only if B is an

integral part of a "natural" proof of A. The "naturalness" of the proof Y is guaranteed by clause

(3), which says that by modifying Y a little bit, it is not so easy to get a simpler proof. Roughly,

clause (3) says that Y is an "approximate local minimum" of simplicity, in proof space.

This is the kind of implication that is useful in building up a belief system. For, under

ordinaryimplication there can never be any sense in assuming that, since A --> Bi, i=1,2,...,N,

and the Bi are true, A might be worth assuming. After all, by contradiction sensitivity a false

statement implies everything. But things are not so simple under relevant implication. If a

statement A significantly implies a number of true statements, that means that by appending the

statement A to one's assumption set I, one can obtain quality proofs of a number of true

statements. If these true statements also happen to be useful, then from a practical point of view

it may be advisable to append A to I. Deductively such a move is not justified, but inductively it

is justified. This fits in with the general analysis of deduction given in SI, according to which

deduction is useful only insofar as induction justifies it.

4.4. CONTRADICTION SENSITIVITY

Get any book for free on: www.Abika.com

51

CHAOTIC LOGIC

Having dealt with implication, let us now turn to the paradox of contradiction sensitivity.

According to reasoning given above, if one uses propositional or predicate calculus to define the

transformation system T, one easily arrives at the following conclusion: if any two of the

propositions in I contradict each other, then D(I,T) is the entire set of all propositions. From one

contradiction, everything is derivable.

This property appears not to reflect actual human reasoning. A person may contradict herself

regarding abortion rights or the honesty of her husband or the ultimate meaning of life. And yet,

when she thinks about theoretical physics or parking her car, she may reason deductively to one

particular conclusion, finding any contradictory conclusion ridiculous.

In his Ph.D. dissertation, daCosta (1984) conceived the idea of a paraconsistent logic, one in

which a single contradiction in I does not imply everything. Others have extended this idea in

various ways. More recently, Avram (1990) has constructed a paraconsistent logic which

incorporates the idea of "relevance logic." Propositions are divided into classes and the inference

from A to A+B is allowed only when A and B are in the same class. The idea is very simple:

according to Avram, although we do use the "contradiction-sensitive" deductive system of

standard mathematical logic, we carefully distinguish deductions in one sphere from deductions

in another, so that we never, in practice, reason "A implies A orB", unless A and B are in the

same "sphere" or "category."

For instance, one might have one class for statements about physics, one for statements about

women, et cetera. The formation of A or B is allowed only if A and B belong to the same class.

A contradiction regarding one of these classes can therefore destroy only reasoning within that

class. So if one contradicted oneself when thinking about one's relations with one's wife, then

this might give one the ability to deduce any statement whatsoever about domestic relations --

but not about physics or car parking or philosophy.

The problem with this approach is its arbitrariness: why not one class for particle physics, one

for gravitation, one for solid-state physics, one for brunettes, one for blondes, one for

redheads,.... Why not, following Lakoff's (1987) famous analysis of aboriginal classification

systems, one category for women, fire and dangerous things?

Of course, it is true that we rarely make statements like "either the Einstein equation has a

unique solution under these initial-boundary conditions or that pretty redhead doesn't want

anything more to do with me." But still, partitioning is too rigid -- it's not quite right. It yields an

elegant formal system, but of course in any categorization there will be borderline cases, and it is

unacceptable to simply ignore these away.

The "partitioning" approach is not the only way of defining relevance formally. But it seems to

be the only definition with any psychological meaning. Read (1988), for instance, disavows

partitioning. But he has nothing of any practical use to put in its place. He mentions the classical

notion of variable sharing -- A and B are mutually relevant if they have variables in common.

But he admits that this concept is inadequate: for instance, "A" and "-A + B" will in general

share variables, but one wishes to forbid their combination in a single expression. He concludes

by defining entailment in such a way that

Get any book for free on: www.Abika.com

52

CHAOTIC LOGIC

[T]he test of whether two propositions are logically relevant is whether either entails the other.

Hence, relevance cannot be picked out prior ... to establishing validity or entailment....

But the obvious problem is, this is not really a definition of relevance:

It may of course be objected that this suggested explication of relevance is entirely circular

andunilluminating, since it amounts to saying no more than that two propositions are logically

relevant if either entails the other....

Read's account of relevance is blatantly circular. Although it is not unilluminating from the

formal-logical point of view; it is of no psychological value.

4.4.1 Contradiction and the Structure of Mind

There is an an alternate approach: to define relevance not by a partition into classes but rather

in terms the theory of structure. It is hypothesized that a mind does not tend to form the

disjunction A or B unless the size

%[(St(A union v)-St(v)]-[St(B union w)-St(w)]%

is small for some (v,w), i.e. unless A and B are in some way closely related. In terms of the

structurally associative memory model, an entity A will generally be stored near those entities to

which it is closely related, and it will tend to interact mainly with these entities.

As to the possibility that, by chance, two completely unrelated entities will be combined in

some formula, say A or B, it is admitted that this could conceivably pose a danger to thought

processes. But the overall structure of mind dictates that a part of the mind which succumbed to

self-contradiction and the resulting inefficiency, would soon be ignored and dismantled.

According to the model of mind outlined above, each mental process supervises a number --

say a dozen -- of others. Suppose these dozen are reasoning deductively, and one of them falls

prey to an internal self-contradiction, and begins giving out random statements. Then how

efficient will that self-contradicting process be? It will be the least efficient of all, and it will

shortly be eliminated and replaced. Mind does not work by absolute guarantees, but rather by

probabilities, safeguards, redundancy and natural selection.

4.4.2. Contradiction and Implication

We have given one way of explaining why contradiction sensitivity need not be a problem

foractual minds. But, as an afterthought, it is worth briefly noting that one may also approach the

problem from the point of view of relevant implication. The step from " A and not-A" to B

involves the step "not-A --> A or B". What does our definition of significant implication say

about this? A moment's reflection reveals that, as noted above, clause (3) kicks in here: A is

totally indispensible to this proof of B; A could just as well be replaced by C, D, E or any other

proposition. The type of implication involved in contradiction sensitivity is not significant to a

very high degree.

Get any book for free on: www.Abika.com

53

CHAOTIC LOGIC

4.5. CONFIRMATION

Finally, what of Hempel's confirmation paradox? Why, although "all ravens are black" is

equivalent to "all non-black entities are non-ravens," is an observation of a blue chair a lousy

piece of evidence for "all ravens are black"?

My resolution is simple, and not conceptually original. Recall the "infon" notation introduced

in Section 2. Just because s |-- i //x to degree d, it is not necessarily the case that s |-- j //x to

degree d for every j equivalent to i under the rules of Boolean logic. This is, basically, all that

needs to be said. Case closed, end of story. Boolean logic is a tool. Only in certain cases does the

mind find it useful.

That the Boolean equivalence of i and j does not imply the equality of d(s,i,x) and d(s,j,x) is

apparent from the definition of degree given above. The degree to which (s,k,x) holds was

defined in terms of the intensity with which the elements of k are patterns in s, where complexity

is defined by s. Just because i and j are Booleanly equivalent, this does not imply that they will

have equal algorithmic information content, equal structure, equal complexity with respect to

some observer s. Setting things up in terms of pattern, one obtains a framework for studying

reasoning in which Hempel's paradox does not exist.

3.5.1 A More Psychological View

In case this seems too glib, let us explore the matter from a more psychological perspective.

Assume that "All ravens are black" happens to hold with degree d, in my experience, from my

perspective. Then to whatdegree does "All non-black entities are non-ravens" hold in my

experience, from my perspective?

"All ravens are black" is an aid in understanding the nature of the world. It is an aid in

identifying ravens. It is a significant pattern in my world that those things which are typically

referred to with the label "raven," are typically possessors of the color black. When storing in my

memory a set of experiences with ravens, I do not have to store with each experience the fact that

the raven in question was black -- I just have to store, once, the statement that all ravens are

black, and then connect this in my memory to the various experiences with ravens.

Now, what about "All non-black entities are non-ravens"? What good does it do me to

recognize this? How does it simplify my store of memories? It does not, not hardly at all. When I

call up a non-black entity from my memory, I will not need to be reminded that it is not a raven.

Why would I have thought that it was a raven in the first place? "Raven-ness?" is not one of the

questions which it is generally useful or interesting to ask about entities, whereas on the other

hand "color?" is one of the questions which it is often interesting to ask about physical objects

such as birds.

So, the real question with Hempel's paradox is, what determines the degree assigned to a given

proposition s |-- i //x. It is not purely the logical form of the proposition, but rather the degree to

which the proposition is useful to x, i.e. the emergence between the proposition and the other

entities which neighbor it in the memory of x. Degree is determined by psychological dynamics,

Get any book for free on: www.Abika.com

54

CHAOTIC LOGIC

rather than Boolean logic. Formally, one may say: the logic of memory organization is what

determines the subjective complexity measure associated with x.

It is not always necessary to worry about where the degrees associated with propositions come

from. But when one is confronted with a paradox regarding degrees, then it is necessary to worry

about it. The real moral of Hempel's paradox, as I see it, is that one should study confirmation in

terms of the structure and dynamics of the mind doing the confirming. Studying confirmation

otherwise, "in the abstract," borders on meaningless.

ñòð. 2 |