. 2
( 10)


Get any book for free on: www.Abika.com

To be extremely rough about it, one might suppose that level 1 corresponds to lines. Then
level 2 might correspond to simple geometrical shapes, level 3 might correspond to complex
geometrical shapes, level 4 might correspond to simple recognizable objects or parts of
recognizable objects, level 5 might correspond to complex recognizable objects, and level 6
might correspond to whole scenes. To say that level 4 processes recognize patterns in the output
of level 3 processes is to say that simple recognizable objects are constructed out of complex
geometrical shapes, rather than directly out of lines or simple geometrical shapes. Each level 4
process is the parent, the controller, of those level 3 nodes that correspond to those complex
geometrical shapes which make up the simple object which it represents. And it is the child, the
controlee, of at least one of the level 5 nodes that corresponds to a complex object of which it is
a part (or perhaps even of one of the level 6 nodes describing a scene of which it is a part -- level
crossing like this can happen, so long as it is not the rule).

My favorite way of illustrating this multilevel control structure is to mention the three-level
"pyramidal" vision processing parallel computer developed by Levitan and his colleages at the
University of Massachusetts. The bottom level deals with sensory data and with low-level
processing such as segmentation into components. The intermediate level takes care of grouping,
shape detection, and so forth; and the top level processes this information "symbolically",
constructing an overall interpretation of the scene. The base level is a 512X512 square array of
processors each doing exactly the same thing to different parts of the image; and the middle level
is composed of a 64X64 square array of relatively powerful processors, each doing exactly the
same thing to different parts of the base-level array. Finally, the top level contains 64 very
powerful processors, each one operating independently according to LISP programs. The
intermediate level may also be augmented by additional connections. This three-level perceptual
hierarchy appears be be an extremely effective approach to computer vision.

That orders are passed down the perceptual hierarchy was one of the biggest insights of the
Gestalt psychologists. Their experiments (Kohler, 1975) showed that we look for certain
configurations in our visual input. We look for those objects that we expect to see, and we look
for those shapes that we are used to seeing. If a level 5 process corresponds to an expected
object, then it will tell its children to look for the parts corresponding to that object, and its
children will tell their children to look for the complex geometrical forms making up the parts to
which they refer, et cetera.

3.1.2. Motor Movements

In its motor control aspect, this multilevel control network serves to send actions from the
abstract level to the concrete level. Again extremely roughly, say level 1 represents muscle
movements, level 2 represents simple combinations of muscle movements, level 3 represents
medium-complexity combinations of muscle movements, and level 4 represents complex
combinations of movements such as raising an arm or kicking a ball. Then when a level 4
process gives an instruction to raise an arm, it gives instructions to its subservient level 3
processes, which then give instructions to their subservient level 2 processes, which given
instructions to level 1 processes, which finally instruct the muscles on what to do in order to kick
the ball. This sort of control moves down the network, but of course all complex motions involve
feedback, so that level k processes are monitoring how well their level k-1 processes are doing

Get any book for free on: www.Abika.com

their jobs and adjusting their instructions accordingly. Feedback corresponds to control moving
up the network.

In a less abstract, more practically-oriented language, Bernstein (see Whiting, 1984) has given
a closely related analysis of motor control. And a very similar hierarchical model of perception
and motor control has been given by Jason Brown (1988), under the name of "microgenesis." His
idea is that lower levels of the hierarchy correspond to older, evolutionarily prior forms of
perception and control.

Let us sum up. The multilevel control methodology, in itself, has nothing to do with patterns.
It is a verysimple and general way of structuring perception and action: subprocesses within
subprocesses within subprocesses, each subprocess giving orders to and receiving feedback from
its subsidiaries. In this general sense, the idea that the mind contains a multilevel control
hierarchy is extremely noncontroversial. But psychological multilevel control networks have
one important additional property. They are postulated to deal with questions of pattern. As in
the visual system, the processes on level N are hypothesized to recognize patterns in the output
of the processes on level N-1, and to instruct these processes in certain patterns of behavior. It
is pattern which is passed between the different levels of the hierarchy.

3.1.3. Genetic Programming

Finally, there is the question of how an effective multilevel control network could ever come
about. As there is no "master programmer" determining which control networks will work better
for which tasks, the only way for a control network to emerge is via directed trial and error.
And in this context, the only natural method of trial and error is the one known as genetic
optimization or genetic programming. These fancy words mean simply that

1) subnetworks of the control network which seem to be working ineffectively are randomly

2) subnetworks of the control network which seem to be working ineffectively are a) swapped
with one another, or b) replaced with other subnetworks.

This substitution may perhaps be subject to a kind of "speciation," in which the probability of
substituting subnetwork A for subnetwork B is roughly proportional to the distance between A
and B in the network.

Preliminary computer simulations indicate that, under appropriate conditions, this sort of
process can indeed converge on efficient programs for executing various perceptual and motor
tasks. However, a complete empirical study of this sort of process remains to be undertaken.


So much for the multilevel control network. Let us now turn to long-term memory. What I
call "structurally associative memory" is nothing but a long-term memory model which the

Get any book for free on: www.Abika.com

connections between processes aredetermined not by control structures, nor by any arbitrary
classification system, but by patterned relations .

The idea of associative memory has a long psychological history. Hundreds, perhaps
thousands of experiments on priming indicate that verbal, visual and other types of memory
display associativity of access. For instance, if one has just heard the word "cat," and one is
shown the picture of a dog, one will identify it as a "dog" very quickly. If, on the other hand, one
has just heard the word "car" and one is shown the picture of a dog, identification of the dog as a
"dog" will take a little bit longer.

Associative memory has also proved very useful in AI. What could be more natural than to
suppose that the brain stores related entities near to each other? There are dozens of different
associative memory designs in the engineering and computer science literature. Kohonen's
(1984) associative memory model was one of the landmark achievements of early neural
network theory; and Kanerva's (1988) sparse distributed memory, based on the peculiar statistics
of the Hamming distance, has yielded many striking insights into the nature of recall.

Psychological studies of associative memory tend to deal with words or images, where the
notion of "association" is intuitively obvious. Engineering associative memories use specialized
mathematical definitions of association, based on inner products, bit string comparisons, etc.
Neither of these paradigms seems to have a reasonably general method of defining association,
or "relatedness."

The idea at the core of the structurally associative memory is that relatedness should be
defined in terms of pattern. In the structurally associative memroy, an entity y is connected to
another entity x if x is a pattern in y. Thus, if w and x have common patterns, there will be many
nodes connected to both w and x. In general, if there are many short paths from w to x in the
structurally associative memory, that means that w and x are closely related; that their structures
probably intersect.

On the other hand, if y is a pattern emergent between w and x, y will not necessarily connect
to w or x, but it will connect to the node z = w U x, if there is such a node. One might expect
that, as a rough rule, z would be higher on the multilevel control network than w or z, thus
interconnecting the two networks in a very fundamental way.

The memory of a real person (or computer) can never be truly associative -- sometimes two
dissimilar things will be stored right next to each other, just by mistake. But it can be
approximately structurally associative, and it can continually reorganize itself so as to maintain a
high degree of structural associativity despite a continual influx of new information.

In The Evolving Mind this reorganization is shown to imply that structurally associative
memories evolve by natural selection -- an entity stored in structurally associative memory is
likely to "survive" (not be moved) if it fits in well with (has patterns in common with, generates
emergent pattern cooperatively with, etc.) its environment, with those entities that immediately
surround it.

Get any book for free on: www.Abika.com

3.2.1. The Dynamics of Memory

More specifically, this reorganization must be understood to take place on many different
levels. There is no "memory supervisor" ruling over the entire long term memory store,
mathematically determining the optimal "location" for each entity. So, logically, the only form
which reorganization can take is that of directed, locally governed trial and error.

How might this trial and error work? The most plausible hypothesis, as pointed out in The
Structure of Intelligence, is as follows: one subnetwork is swapped with another; or else
subnetwork A is merely copied into the place of subnetwork B. All else equal, substitution will
tend to take place in those regions where associativity is worse; but there may also be certain
subnetworks that are protected against having their sub-subnetworks removed or replaced.

If the substitution(s) obtained by swapping or copying are successful, in the sense of
improving associativity, then the new networks formed will tend not to be broken up. If the
substitutions are unsuccessful, then more swapping or copying will be done.

Finally, these substitutions may take place in a multilevel manner: large networks may be
moved around, and at the same time the small networks which make them up may be internally
rearranged. The multilevel process will work best if, after a large network is moved, a reasonable
time period is left for its subnetworks to rearrange among themselves and arrive at a "locally
optimal" configuration. This same "waiting" procedure may be applied recursively: after a
subnetwork is moved,it should not be moved again until its sub-subnetworks have had a chance
to adequately rearrange themselves. Note that this reorganization scheme relies on the
existence of certain "barriers." For instance, suppose network A contains network B, which
contains network C. C should have more chance of being moved to a given position inside B
than to a given position out of B. It should have more chance of moving to a given position
inside A-B, than to a given position outside A (here A-B means the portion of A that is not in B).
And so on -- if A is contained in Z, C should have more chance of being moved to a position in
Z-A than outside Z.

In some cases these restrictions may be so strong as to prohibit any rearrangement at all: in
later chapters, this sort of comprehensive rearrangement protection will be identified with the
more familiar concept of reality. In other cases the restrictions may be very weak, allowing the
memory to spontaneously direct itself through a free-floating, never-ending search for perfect

In this context, I will discuss the psychological classification of people into thin-boundaried
and thick-boundaried personality types. These types would seem to tie in naturally with the
notion of rearrangement barriers in the structurally associative memory. A thick-boundaried
person tends to have generally stronger rearrangement barriers, and hence tends to reify things
more, to be more resistant to mental change. A thin-boundaried person, on the other hand, has
generally weaker rearrangement barriers, and thus tends to permit even fixed ideas to shift, to
display a weaker grasp on "reality."

Get any book for free on: www.Abika.com

The strength and placement of these "rearrangement barriers" might seem to be a sticky issue.
But the conceptual difficulty is greatly reduced if one assumes that the memory network is
"fractally" structured -- structured in clusters within clusters ... within clusters, or equivalently
networks within networks ... within networks. If this is the case, then one may simply assume
that a certain "degree of restriction" comes along with each cluster, each network of networks of
... networks. Larger clusters, larger networks, have larger degrees of restriction.

The only real question remaining is who assigns this degree. Are there perhaps mental
processes which exist mainly to adjust the degrees of restriction imposed by other processes?
This is a large question, and a complete resolution will have to wait till later. Partof the answer,
however, will be found in the following section, in the concept of the dual network.


Neither a structurally associative memory nor a multilevel control network can, in itself, lead
to intelligence. What is necessary is to put the two together: to take a single set of
entities/processes, and by drawing a single set of collections between them, structure them both
according to structural associativity and according to multilevel control. This does not mean just
drawing two different graphs on the same set of edges: it means that the same connections must
serve as part of a structurally associative memory and part of a multilevel control network.
Entities which are connected via multilevel control must, on the whole, also be connected via
structural associativity, and vice versa.

A moment's reflection shows that it is not possible to superpose an arbitrary associative
memory structure with a multilevel control hierarchy in this way. In fact, such superposition is
only possible if the entities stored in the associative memory are distributed in an approximately
"fractal" way (Barnsley, 1988; Edgar, 1990).

In a fractally distributed structurally associative memory, on the "smallest" scale, each process
is contained in a densely connected subgraph of "neighbors," each of which is very closely
related to it. On the next highest scale, each such neighborhood is connected to a collection of
"neighboring neighborhoods," so that the elements of a neighboring neighborhood are fairly
closely related to its elements. Such a neighborhood of neighborhoods may be called a 2'nd-level
neighborhood, and in an analogous manner one may define k'th-level neighborhoods. Of course,
this structure need not be strict: may be breaks in it on every level, and each process may appear
at several different vertices.

A good way to understand the fractal structure of the heterarchical network is to think about
the distribution of subjects in a large library. One has disciplines, sub-disciplines, sub-sub-
disciplines, and so forth -- clusters within clusters within clusters, rather than a uniformly
distributed field of subjects. And a good way to visualize the superposition of a hierarchical
network on this structure is to postulate a head librarian dealing with each discipline, an assistant
librarian dealing with each sub-sub-discipline, anassistant assistant librarian dealing with each
sub-sub-sub-discipline, and so on. If one imagines that each librarian, assistant librarian, etc.,
gives her subsidiaries general goals and lets them work out their own strategies, then one has a

Get any book for free on: www.Abika.com

control hierarchy that works approximately according to the multilevel methodology. The
hierarchy of control is lined up perfectly with the fractal heterarchy of conceptual commonality.

A dual network, then, is a collection of processes which are arranged simultaneously in an
hierarchical network and an heterarchical network. Those processes with close parents in the
hierarchical network are, on the whole, correspondingly closely related in the heterarchical

This brings us back to the problem of rearrangement barriers. The rearrangement barriers of
the associative memory network may be set up by the heterarchical network, the multilevel
control network. And, strikingly, in the dual network architecture, substituting of subnetworks
of the memory network is equivalent to genetic optimization of the control network. The same
operation serves two different functions; the quest for associativity and the quest for efficient
control are carried out in exactly the same way. This synergy between structure and dynamics is
immensely satisfying.

But, important and elegant as this is, this is not the only significant interaction between the
two networks. A structurally associative memory is specifically configured so as to support
analogical reasoning. Roughly speaking, analogy works by relating one entity to another entity
with which it shares common patterns, and the structurally associative memory stores an entity
near those entities with which it shares common patterns. And the hierarchical network, the
perceptual-motor hierarchy, requires analogical reasoning in order to do its job. The purpose of
each cluster in the dual network is to instruct its subservient clusters in the way that it estimates
will best fulfill the task given to it by its master cluster -- and this estimation is based on
reasoning analogically with respect to the information stored in its memory bank.

Let's get a little more concrete. The brain is modeled as a dual network of neural networks. It
is considered to consist of "level k clusters" of autonomous neural networks, each one of which
consists of 1) a number of level k-1 clusters, all related to each other, 2) some networks that
monitor and control these level k-1clusters. The degree of control involved here may be highly
variable. However, the neurological evidence shows that entire knowledge bases may be outright
moved from one part of the brain to another (Blakeslee, 1991), so that in some cases the degree
of control is very high.

For example, a level 2 cluster might consist of processes that recognize shapes of various sorts
in visual inputs, together with a network regulating these processes. This cluster of shape
recognition processes would be organized according to the principle of structurally associative
memory, so that e.g. the circle process and the ellipse process would be closer to each other than
to the square process. This organization would permit the regulating process to execute
systematic analogical search for a given shape: if in a given situation the circle process were seen
to be fairly successful, but the square process not at all successful, then the next step would be to
try out those processes near to the circle process.

3.3.1. Precursors of the Dual Network Model

Get any book for free on: www.Abika.com

After hinting at the dual network model in The Structure of Intelligence, and presenting it fully
in The Evolving Mind, I came across two other models of mind which mirror many of its aspects.
First of all, I learned that many cognitive scientists are interested in analyzing thought as a
network of interconnected " schema" (Arbib and Hesse, 1986). This term is not always well
defined -- often a "schema" is nothing more than a process, an algorithm. But Arbib equates
"schema" with Charles S. Peirce's "habit," bringing it very close to the concept of pattern. The
global architecture of this network of schema is not discussed, but the connection is there

Also, I encountered the paper "Outline for a Theory of Intelligence" by James S. Albus (1991),
Chief of the Robot Systems Division of the National Institute of Standards and Technology. I
was pleased to find therein a model of mind strikingly similar to the dual network, complete with
diagrams such as Figure 6. Albus's focus is rather different than mine -- he is concerned with the
differential equations of control theory rather than the algorithmic structure of reasoning and
memory processes. But the connection between the fractal structure of memory and the
hierarchical structure of control, which is perhaps the most essential component in the dual
network, is virtually implicit in his theory.

Putting the schema theory developed by cognitive scientists together with the global structure
identified by Albus through his robotics work, one comes rather close to a crude version of the
dual network model. This is not how the dual network model was conceived, but it is a rather
satisfying connection. For the dual network structure is, after all, a rather straightforward idea.
What is less obvious, and what has not emerged from cognitive science or engineering, is the
dynamics of the dual network. The way the dual network unifies memory reorganization with
genetic optimization has not previously been discussed; nor has the dynamics of barrier
formation and its relationship with consciousness, language and perception (to be explored in
Chapter Six).


The dual network model, as just presented, dismisses the problem of predicting the future
rather cursorily. But this is not entirely justified. Prediction of the behavior of a complex system
is an incredibly very difficult task, and one which lies at the very foundation of intelligence. The
dual network model has no problem incorporating this sort of prediction, but something should
be said about how its prediction processes work, rather than just about how they are

One way to predict the future of a system, given certain assumptions about its present and
about the laws governing its behavior, is to simply simulate the system. But this is inefficient,
for a very simple physicalistic reason. Unlike most contemporary digital computers, the brain
works in parallel -- there are a hundred billion neurons working at once, plus an unknown
multitude of chemical reactions interacting with and underlying this neural behavior. And each
neuron is a fairly complicated biochemical system, a far cry from the on-off switch in a digital
computer. But when one simulates a system, one goes one step at a time . To a certain extent,
this wastes the massive parallelism of the brain.

Get any book for free on: www.Abika.com

So, the question is, is simulation the best a mind can do, or are there short-cuts? This question
ties in with some pressing problems of modern mathematics and theoretical computer science.
One of the biggest trends in modern practical computer science is the development of parallel-
processing computers, and it is of great interest to know when these computers can outperform
conventional serial computers, and by what margin.

3.4.1. Discrete Logarithms (*)

For a simple mathematical example, let us look to the theory of finite fields. A finite field is a
way of doing arithmetic on a bounded set of integers. For instance, suppose one takes the field of
size 13 (the size must be a prime or a prime raised to some power). Then, in this field the largest
number is 12. One has, for example, 12 + 1 = 0, 10 + 5 = 2, 3 x 5 = 2, and 8 x 3 = 12. One can do
division in a finite field as well, although the results are often counterintuitive -- for instance,
12/8 = 3, and 2/3 = 5 (to see why, just multiply both sides by the denominator).

In finite field theory there is something called the "discrete logarithm" of a number, written
dlogb(n). The discrete logarithm is defined just like the ordinary logarithm, as the inverse of
exponentiation. But in a finite field, exponentiation must be defined in terms of the "wrap-
around" arithmetic illustrated in the previous paragraph. For instance, in the field of size 7, 34 =
4. Thus one has dlog3(4) = 4. But how could one compute the log base 3 of 4, without knowing
what it was? The powers of 3 can wrap around the value 7 again and again -- they could wrap
around many times before hitting on the correct value, 4.

The problem of finding the discrete logarithm of a number is theoretically easy, in the sense
that there are only finitely many possibilities. In our simple example, all one has to do is take 3
to higher and higher powers, until all possibilities are covered. But in practice, if the size of the
field is not 7 but some much larger number, this finite number of possibilities can become
prohibitively large.

So, what if one defines the dynamical system nk = dlogb(nk-1)? Suppose one is given n1, then
how can one predict n1000? So far as we know today, there is better way than to proceed in order:
first get n2, then n3, then n4, and so on up to n999 and n1000. Working on n3 before one knows n2 is
essentially useless, because a slight change in the answer for n2 can totally chagne the answer for
n3. The only way to do all 1000 steps in parallel, it seems, would be to first compute a table of all
possible powers that one might possibly need to know in the course of calculation. But this
would require an immense number of processors; at least the square of the size of the field.

This example is, incidentally, of more than academic interest. Many cryptosystems in current
use are reliant on discrete logarithms. If one could devise a quickmethod for computing them,
one could crack all manner of codes; and the coding theorists would have to come up with
something better.

3.4.2. Chaos and Prediction

More physicalistic dynamical systems appear to have the same behavior. The classic example
is the "logistic" iteration xk = cxk-1(1-xk-1), where c=4 or c assumes certain values between 3.8

Get any book for free on: www.Abika.com

and 4, and the xk are discrete approximations of real numbers. This equation models the
dynamics of certain biological populations, and it also approximates the equations of fluid
dynamics under certain conditions.

It seems very, very likely that there is no way to compute xn from x1 on an ordinary serial
computer, except to proceed one step at a time. Even if one adds a dozen or a thousand or a
million processors, the same conclusion seems to hold. Only if one adds a number of processors
roughly proportional to 2n can one obtain a significant advantage from parallelism.

In general, all systems of equations called chaotic possess similar properties. These include
equations modeling the weather, the flow of blood through the body, the motions of planets in
solar systems, and the flow of electricity in the brain. The mathematics of these systems is still in
a phase of rapid development. But the intuitive picture is clear. To figure out what the weather
will be ninety days from now, one must run an incredibly accurate day-by-day simulation -- even
with highly parallel processing, there is no viable alternate strategy.

3.4.3. Chaos, Prediction and Intelligence

A mind is the structure of an intelligent system, and intelligence relies on prediction, memory
and optimization. Given the assumption that some past patterns will persist, a mind must always
explore several different hypotheses as to which ones will persist. It must explore several
different possible futures, by a process of predictive extrapolation. Therefore, intelligence
requires the prediction of the future behavior of partially unpredictable systems.

If these systems were as chaotic as xk = 4xk(1-xk), all hope would be lost. But the weather
system is a better example. It is chaotic in its particular details -- there is no practical way, today
in 1992, to determine the temperature on July 4 1999 in Las Vegas. But there are certain
persistent patterns that allow one to predict its behavior in a qualitative way. After all, the
temperature on July 4 1999 in Las Vegas will probably be around 95-110 Fahrenheit. One can
make probabilistic, approximate predictions -- one can recognize patterns in the past and
hope/assume that they will continue.

Our definition of intelligence conceals the presupposition that most of the prediction which
the mind has to do is analogous to this trivial weather prediction example. No step-by-step
simulation is required, only inductive/analogical reasoning, supported by memory search.
However, the fact remains that sometimes the mind will run across obstinate situations --
prediction problemss that are not effectively tackled using intuitive memory or using parallel-
processing shortcuts. In these cases, the mind has no choice but to resort to direct simulation (on
some level of abstraction).

The brain is a massively parallel processor. But when it runs a direct simulation of some
process, it is acting like a serial processor. In computerese, it is running a virtual serial
machine. The idea that the parallel brain runs virtual serial machines is not a new one -- in
Consciousness Explained Daniel Dennett proposes that consciousness is a virtual serial machine
run on the parallel processor of the brain. As will be seen in Chapter Six, although I cannot
accept Dennett's reductionist analysis of consciousness, I find a great deal of merit in this idea.

Get any book for free on: www.Abika.com


To proceed further with my formal theory of intelligence, I must now introduce some slightly
technical definitions. The concept of a structured transformation system will be absolutely
essential to the theory of language and belief to be given in later chapters. But before I can say
what a structured transformation system is, I must define a plain old transformation system.

In words, a transformation system consists of a set I of initials, combined with a set of T
transformation rules. The initials are the "given information"; the transformation rules are
methods for combining andaltering the initials into new statements. The deductive system itself,
I will call D(I,T).

For instance, in elementary algebra one has transformation rules such as

X = Y implies X+Z = Y+Z, and XZ = YZ

(X + Y) + Z = X + (Y+Z)

X- X=0



If one is given the initial

2q - r = 1

one can use these transformation rules to obtain

q = (1 + r)/2.

The latter formula has the same content as the initial, but its form is different.

If one had a table of numbers, say


1 1

2 3/2

3 2

4 5/2

Get any book for free on: www.Abika.com

5 3


99 50

then the "q=(1+r)/2" would be a slightly more intense pattern in one's table than "2q+r=1." For
the work involved in computing the table from "2q+r=1" is a little greater -- one must solve for q
each time r is plugged in, or else transform the equation into "q=(1+r)/2."

Thus, although in a sense transformation systems add no content to their initials, they are
capable of producing new patterns. For a list of length 100, as given above, both are clearly
patterns. But what if the list were of length 4? Then perhaps "2q + r=1" would not be a pattern:
the trouble involved in using it might be judged to exceed the difficulty of using the list itself.
But perhaps q = (1+r)/2 would still be a pattern. It all depends on who's doing the judging of
complexities -- but for any judge there is likely to be some list length for which one formula is a
pattern and the other is not.

This is, of course, a trivial example. A better example is Kepler's observation that planets
move in ellipses. This is a nice compact statement, which can be logically derived from Newton's
Three Laws of Motion. But the derivation is fairly lengthy and time-consuming. So if one has a
brief list of data regarding planetary position, it is quite possible that Kepler's observation will be
a significant pattern, but Newton's Three Laws will not. What is involved here is the complexity
of producing x from the process y. If this complexity is too great, then no matter how simple
the process y, y will not be a pattern in x.

3.5.1. Transformation Systems (*)

In this section I will give a brief formal treatment of "transformation systems." Let W be any
set, let A be a subset of W, called the set of "expressions"; and let I = {W1, W2, ..., Wn} be a
subset of W, called the set of initials. Let W* denote the set {W,WxW,WxWxW,...). And let T =
{F1,F2,...,Fn} be a set of transformations; that is, a set of functions each of which maps some
elements of W* into elements of A. For instance, if W were a set of propositions, one might have
F1(x,y)= x and y, and F2(x) = not x.

Let us now define the set D(I,T) of all elements of S which are derivable from the
assumptions I via the transformations T. First of all, it is clear that I should be a subset of D(I,T).
Let us call the elements of I the depth-zero elements of D(I,T). Next, what about elements of the
form x = Fi(A1,...,Am), for some i, where each Ak=Ij for some j? Obviously, these elements are
simple transformations of the assumptions; they should be elements of D(I,T) as well. Let us call
these the depth-one elements of D(I,T). Similarly, one may define an element x of S to be a
depth-n element of D(I,T) if x=Fi(A1,...,Am), for some i, where each of the Ak is a depth-p
element of D(I,T), for some p<n. Finally, D(I,T) may then be defined as the set of all x which are
depth-n elements of D(I,T) for some n.

Get any book for free on: www.Abika.com

For example, if the T are rules of logic and the I are some propositions about the world, then
D(I,T) is the set of all propositions which are logically equivalent to some subset of I. In this case
deduction is a matter of finding the logical consequences of I, which are presumably a small
subset of the total set S of all propositions. This is the general form of deduction. Boolean logic
consists of a specific choice of T; andpredicate calculus consists of an addition onto the set T
provided by Boolean logic.

It is worth noting that, in this approach to deduction, truth is inessential. In formal logic it is
conventional to assume that one's assumptions are "true" and one's transformations are "truth-
preserving." However, this is just an interpretation foisted on the deductive system after the fact.

3.5.2. Analogical Structure

The set (I,T) constructed above might be called a transformation system. It may be likened
to a workshop. The initials I are the materials at hand, and the transformations T are the tools.
D(I,T) is the set of all things that can be built, using the tools, from the materials.

What is lacking? First of all, blueprints. In order to apply a transformation system to real
problem, one must have some idea of which transformations should be applied in which

But if an intelligence is going to apply a transformation system, it will need to apply it in a
variety of different contexts. It will not know exactly which contexts are going to arise in future.
It cannot retain a stack of blueprints for every possible contingency. What it needs is not merely
a stack of blueprints, but a mechanism for generating blueprints to fit situations.

But, of course, it already has such a mechanism -- its innate intelligence, its ability to induce,
to reason by analogy, to search through its associative memory. What intelligence needs is a
transformation system structured in such a way that ordinary mental processes can serve as its
blueprint-generating machine.

In SI this sort of transformation system is called a "useful deductive system." Here, however, I
am thinking more generally, and I will use the phrase structured transformation system
instead. A structured transformation system is a transformation system with the property that, if a
mind wants to make a "blueprint" telling it how to construct something from the initials using
the transformations, it can often approximately do so by reasoning analogically with respect to
the blueprints from other construction projects.

Another way to put it is: a structured transformation system, or STS, is transformation
system with the property that the proximity between x and y in an ideal structurally associative
memory is correlatedwith the similarity between the blueprint sets corresponding to x and y. A
transformation system is structured if the analogically reasoning mind can use it, in practice, to
construct things to order. This construction need not be infallible -- it is required only that it
work approximately, much of the time. (*) A Formal Definition

Get any book for free on: www.Abika.com

One formal definition goes as follows. Let x and y be two elements of D(I,T), and let GI,T(x)
and GI,T(y) denote the set of all proofs in the system (I,T) of x and y respectively. Let U equal the
minimum over all functions v of the sum a|v| + B, where B is the average, over all pairs (x,y) so
that x and y are both in D(I,T), of the correlation coefficient between

d#[St(x union v)-St(v), St(y union v) - St(v)]



Then (I,T) is structured to degree U.

Here d#(A,B) is the structural complexity of the symmetric difference of A and B. And d* is a
metric on the space of "set of blueprints," so that the d*[GI,T(x),GI,T(y)] denotes of the distance
between the set of proofs of x and the set of proofs of y.

If the function v were omitted, then the degree of structuredness of U would be a measure of
how true it is that structurally similar constructions have similar blueprint sets. But the inclusion
of the function v broadens the definition. It need not be the case that similar x and y have similar
blueprint sets. If x and y display similar emergent patterns on conjunction with some entity v,
and x and y have similar blueprint sets, then this counts as structuredness too.

3.5.3. Transformation, Prediction and Deduction

What do STS's have to do with prediction? To make this connection, it suffices to interpret the
depth index of an element of D(I,T) as a time index. In other words, one may assume that to
apply each transformation in T takes some integer number of "time steps," and consider the
construction of an element in D(I,T) as a process of actual temporal construction. This is a
natural extension of the "materials, tools and blueprints" metaphor introduced above.

A simulation of some process, then, begins with an initial condition (an element of I) and
proceeds to apply dynamical rules (elements of T), one after the other. In the case of a simple
iteration like xk = cxk-1(1-xk-1), the initial condition is an approximation of a real number, and
there is only one transformation involved, namely the function f(x) = cx(1-x) or some
approximation thereof. But in more complex simulations there may be a variety of different

For instance, a numerical iteration of the form xk = f(k,xk-1) rather than xk = f(xk-1) requires a
different iteration at each time step. This is precisely the kind of iteration used to generate
fractals by the iterated function system method (Barnsley, 1988). In this context, oddly enough, a
random or chaotic choice of k leads to a more intricately structured trajectory than an orderly
choice of k.

So, the process of simulating a dynamical system and the process of making a logical
deduction are, on the broadest level, the same. They both involve transformation systems. But

Get any book for free on: www.Abika.com

what about the structured part? What would it mean for a family of simulations to be executed
according to a structured transformation system?

It would mean, quite simply, that the class of dynamical rule sequences that lead up to a
situation is correlated with the structure of the situation. With logical deduction, one often
knows what one wants to prove, and has to find out how to prove it -- so it is useful to know
what worked to prove similar results. But with simulation, it is exactly the reverse. One often
wants to know what the steps in one's transformation sequence will lead to, because one would
like to avoid running the whole transformation sequence through, one step at a time. So it is
useful to know what has resulted from running through similar transformation sequences. The
same correlation is useful for simulation as for deduction -- but for a different reason.

Actually, this is an overstatement. Simulation makes some use of reasoning from similarity of
results to similarity transformation sequences -- because one may be able to guess what the
results of a certain transformation sequence will be, and then one will want to know what similar
transformation sequences have led to, in order to assess the plausibility of one's guess. And
deduction makes some use of reasoning from similarity of transformation sequences to similarity
of results -- one may have an idea for a "proof strategy," and use analogical reasoning to make a
guess at whether this strategy will lead to anything interesting. There is adistinction between the
two processes, but it is not precisely drawn.

In conclusion, I propose that most psychological simulation and deduction is done by
structured transformation systems. Some short simulations and deductions may be done without
the aid of structure -- but this is the exception that proves the rule. Long chains of deductive
transformations cannot randomly produce useful results. And long chains of dynamical
iterations, if unmonitored by "common sense", are likely to produce errors -- this is true even of
digital computer simulations, which are much more meticulous than any program the human
brain has ever been known to run.

Psychologically, structured transformation systems are only effective if run in parallel.
Running one transformation after another is very slow. Some simulations, and some logical
deductions, will require this. But the mind will do its utmost to avoid it. One demonstration of
this is the extreme difficulty of doing long mathematical proofs in one's head. Even the greatest
mathematicians used pencil and paper, to record the details of the last five steps while they filled
up their minds with the details of the next five.

Chapter Four


I have already talked a little about deduction and its role in the mind. In this chapter, however,
I will develop this theme much more fully. The relation between psychology and logic is
important, not only because of the central role of deductive logic in human thought, but also
because it is a microcosm of the relation between language and thought in general. Logic is an

Get any book for free on: www.Abika.com

example of a linguistic system, and it reveals certain phenomena that are obscured by the sheer
complexity of other linguistic systems.


Today, as John MacNamara has put it, "logicians and psychologists generally behave like the
men and women in an orthodox synagogue. Each group knows about the other, but it is proper
form that each should ignore the other" (1986, p.1). But such was not always the case. Until
somewhere toward the end of nineteenth century, the two fields of logic and psychology were
closely tied together. What changed things was, on the one hand, the emergence of experimental
psychology; and, on the other hand, the rediscovery and development of elementary symbolic
logic by Boole, deMorgan and others.

The early experimental psychologists purposely avoided explaining intelligence in terms of
logic. Mental phenomena were analyzed in terms of images, associations, sensations, and so
forth. And on the other hand -- notwithstanding the psychological pretensions of Leibniz's early
logical investigations and Boole's Laws of Thought -- the early logicians moved further and
further each decade toward considering logical operationsas distinct from psychological
operations. It was increasingly realized on both sides that the formulas of propositional logic
have little connection with emotional, intuitive, ordinary everyday thought.

Of course, no one denies that there is some relation between psychology and logic. After all,
logical reasoning takes place within the mind. The question is whether mathematical logic is a
very special kind of mental process, or whether, on the other hand, it is closely connected with
everyday thought processes. And, beginning around a century ago, both logicians and
psychologists have overwhelmingly voted for the former answer.

The almost complete dissociation of logic and psychology which one finds today may be
partly understood as a reaction against the nineteenth-century doctrines of psychologism and
logism. Both of these doctrines represent extreme views: logism states that psychology is a
subset of logic; and psychologism states that logic is a subset of psychology.

Boole's attitude was explicitly logist -- he optimistically suggested that the algebraic equations
of his logic corresponded to the structure of human thought. Leibniz, who anticipated many of
Boole's discoveries by approximately two centuries, was ambitious beyond the point of logism as
I have defined it here: he felt that elementary symbolic logic would ultimately explain not only
the mind but the physical world. And logism was also not unknown among psychologists -- it
was common, for example, among members of the early Wurzburg school of Denkpsychologie.
These theorists felt that human judgements generally followed the forms of rudimentary
mathematical logic.

But although logism played a significant part in history, the role of psychologism was by far
the greater. Perhaps the most extreme psychologism was that of John Stuart Mill (1843), who in
his System of Logic argued that

Get any book for free on: www.Abika.com

Logic is not a Science distinct from, and coordinate with, Psychology. So far as it is a Science at
all, it is a part or branch of Psychology.... Its theoretic grounds are wholly borrowed from

Mill understood the axioms of logic as "generalizations from experience." For instance, he gave
the following psychological "demonstration" of the Law of ExcludedMiddle (which states that
for any p, either p or not-p is always true):

The law on Excluded Middle, then, is simply a generalization of the universal experience that
some mental states are destructive of other states. It formulates a certain absolutely constant law,
that the appearance of any positive mode of consciousness cannot occur without excluding a
correlative negative mode; and that the negative mode cannot occur without excluding the
correlative positive mode.... Hence it follows that if consciousness is not in one of the two modes
it much be in the other (bk. 2, chap.,7, sec.5)

Even if one accepted psychologism as a general principle, it is hard to see how one could take
"demonstrations" of this nature seriously. Of course each "mode of consciousness" or state of
mind excludes certain others, but there is no intuitively experienced exact opposite to each state
of mind. The concept of logical negation is not a "generalization" of but rather a specialization
and falsification of the common psychic experience which Mill describes. The leap from
exclusion to exact opposition is far from obvious and was a major step in the development of
mathematical logic.

As we will see a little later, Nietzsche (1888/1968) also attempted to trace the rules of logic to
their psychological roots. But Nietzsche took a totally different approach: he viewed logic as a
special system devised by man for certain purposes, rather than as something wholly deducible
from inherent properties of mind. Mill was convinced that logic must follow automatically from
"simpler" aspects of mentality, and this belief led him into psychological absurdities.

The early mathematical logicians, particularly Gottlob Frege, attacked Mill with a vengeance.
For Frege (1884/1952) the key point was the question: what makes a sentence true? Mill, as an
empiricist, believed that all knowledge must be derived from sensory experience. But Frege
countered that "this account makes everything subjective, and if we follow it through to the end,
does away with truth" (1959, p. vii). He proposed that truth must be given a non-psychological
definition, one independent of the dynamics of any particular mind. This Fregean conception of
truth received its fullestexpression in Tarski's (1935) and Montague's (1974) work on formal
semantics, to be discussed in Chapter Five.

To someone acquainted with formal logic only in its recent manifestations, the very concept of
psychologism is likely to seem absurd. But the truth is that, before the work of Boole, Frege,
Peano, Russell and so forth transformed logic into an intensely mathematical discipline, the
operations of logic did have direct psychological relevance. Aristotle's syllogisms made good
psychological sense (although we now know that much useful human reasoning relies on
inferences which Aristotle deemed incorrect). The simple propositional logic of Leibniz and
Boole could be illustrated by means of psychological examples. But the whole development of
modern mathematical logic was based on the introduction of patently non-psychological axioms

Get any book for free on: www.Abika.com

and operations. Today few logicians give psychology a second thought, but for Frege it was a
major conceptual battle to free mathematical logic from psychologism.

In sum, psychologists ignored those few voices which insisted on associating everyday mental
processes with mathematical logic. And, on the other hand, logicians actively rebelled against the
idea that the rules of mathematical logic must relate to rules of mental process. Psychology
benefited from avoiding logism, and logic gained greatly from repudiating psychologism.

4.1.1. The Rebirth of Logism

But, of course, that wasn't the end of the story. Although contemporary psychology and logic
have few direct relations with one another, in the century since Frege there has arisen a brand
new discipline, one that attempts to bring psychology and logic closer together than they ever
have been before. I am speaking, of course, about artificial intelligence.

Early AI theorists -- in the sixties and early seventies -- brought back logism with a
vengeance. The techniques of early AI were little more than applied Boolean logic and tree
search, with a pinch or two of predicate calculus, probability theory and other mathematical
tricks thrown in for good measure. But every few years someone optimistically predicted that an
intelligent computer was just around the corner. At this stage AI theorists basically ignored
psychology -- they felt that deductive logic, and deductive logic alone, was sufficient for
understanding mental process.

But by the eighties, AI was humbled by experience. Despite some incredible successes,
nothing anywhere neara "thinking machine" has been produced. No longer are AI theorists too
proud to look to psychology or even philosophy for assistance. Computer science still relies
heavily on formal logic -- not only Boolean logic but more recent innovations such as model
theory and non-well-founded sets (Aczel, 1988) -- and AI is no exception. But more and more AI
theorists are wondering now if modern logic is adequate for their needs. Many, dissatisfied with
logism, are seeking to modify and augment mathematical logic in ways that bring it closer to
human reasoning processes. In essence, they are augmenting their vehement logism with small
quantities of the psychologism which Frege so abhorred.

4.1.2. The Rebirth of Ps ychologism

This return to a limited psychologism is at the root of a host of recent developments in several
different areas of theoretical AI. Perhaps the best example is nonmonotonic logic, which has
received a surprising amount of attention in recent years. But let us dwell, instead, on an area of
research with more direct relevance to the present book: automated theorem proving.

Automatic theorem proving -- the science of programming computers to prove mathematical
theorems -- was once thought of as a stronghold of pure deductive logic. It seemed so simple:
just apply the rules of mathematical logic to the axioms, and you generate theorems. But now
many researchers in automated theorem proving have realized that this is only a very small part
of what mathematicians do when they prove theorems. Even in this ethereal realm of reasoning,
tailor-made for logical deduction, nondeductive, alogical processes are of equal importance.

Get any book for free on: www.Abika.com

For example, after many years of productive research on automated theorem proving, Alan
Bundy (1991) has come to the conclusion that

Logic is not enough to understand reasoning. It provides only a low-level, step by step
understanding, whereas a high-level, strategic understanding is also required. (p. 178)

Bundy proposes that one can program a computer to demonstrate high-level understanding of
mathematical proofs, by supplying it with the ability to manipulate entities called proof plans.

A proof plan is defined as a common structure that underlies and helps to generate many
differentmathematical proofs. Proof plans are not formulated based on mathematical logic alone,
they are rather

refined to improve their expectancy, generality, prescriptiveness, simplicity, efficiency and
parsimony while retaining their correctness. Scientific judgement is used to find a balance
between these sometimes opposing criteria. (p.197)

In other words, proof plans, which control and are directed by deductive theorem-proving, are
constructed and refined by illogical or alogical means.

Bundy's research programme -- to create a formal, computational theory of proof plans -- is
about as blatant as pychologism gets. In fact, Bundy admits that he has ceased to think of himself
as a researcher in automated theorem proving, and come to conceive of himself as a sort of
abstract psychologist:

For many years I have regarded myself as a researcher in automatic theorem proving. However,
by analyzing the methodology I have pursued in practice, I now realize that my real motivation is
the building of a science of reasoning.... Our science of reasoning is normative, empirical and
reflective. In these respects it resembles other human sciences like linguistics and Logic. Indeed
it includes parts of Logic as a sub-science. (p. 197)

How similar this is, on the surface at least, to Mill's "Logic is ... a part or branch of Psychology"!
But the difference, on a deeper level, is quite large. Bundy takes what I would call a Nietzschean
rather than a Millean approach. He is not deriving the laws of logic from deeper psychological
laws, but rather studying how the powerful, specialized reasoning tool that we call "deductive
logic" fits into the general pattern of human reasoning.


Bundy defends what I would call a "limited Boolean logism." He maintains that Boolean logic
and related deductive methods are an important part of mental process, but that they are
supplemented by and continually affected by other mental processes. At first sight, this
perspective seems completely unproblematic. We think logically when we need to, alogically
when we need to; and sometimes the two modes of cognition will interact. Very sensible.

Get any book for free on: www.Abika.com

But, as everyone who has taken a semester of university logic is well aware, things are not so
simple. Even limited Boolean logism has its troubles. I am speaking about the simple conceptual
conundrums of Boolean logic, such as Hempel's paradox of confirmation and the paradoxes of
implication. These elementary "paradoxes," though so simple that one could explain them to a
child, are obstacles that stand in the way of even the most unambitious Boolean logism. They
cast doubt as to whether Boolean logic can ever be of any psychological relevance whatsoever.

4.2.1. Boolean Logic and Modern Logic

One might well wonder, why all this emphasis on Boolean logic. After all, from the logician's
point of view, Boolean logic -- the logic of "and", "or" and "not" -- is more than a bit out-of-date.
It does not even include quantification, which was invented by Peirce before the turn of the
century. Computer circuits are based entirely on Boolean logic; however, modern mathematical
logic has progressed as far beyond Leibniz, Boole and deMorgan as modern biology has
progressed beyond Cuvier, von Baer and Darwin.

But still, it is not as though modern logical systems have shed Boolean logic. In one way or
another, they are invariably based on Boolean ideas. Mathematically, nearly all logical systems
are "Boolean algebras" -- in addition to possessing other, subtler structures. And, until very
recently, one would have been hard put to name a logistic model of human reasoning that did not
depend on Boolean logic in a very direct way. I have already mentioned two exceptions,
nonmonotonic logic and proof plans, but these are recent innovations and still in very early
stages of development.

So the paradoxes of Boolean logic are paradoxes of modern mathematical logic in general.
They are the most powerful weapon in the arsenal of the contemporary anti-logist. Therefore, the
most sensible way to begin our quest to synthesize psychology and logic is to dispense with these

Paradoxes of this nature cannot be "solved." They are too simple for that, too devastatingly
fundamental. So my aim here is not to "solve" them, but rather to demonstrate that they are
largely irrelevant to theproject of limited Boolean logism -- if this project is carried out in the
proper way. This demonstration is less logical than psychological. I will assume that the mind
works by pattern recognition and multilevel optimization, and show that in this context Boolean
logic can control mental processes without succumbing to the troubles predicted by the

4.2.2. The Paradoxes of Boolean Logic

Before going any further, let us be more precise about exactly what these "obstacles" are. I
will deal with four classic "paradoxes" of Boolean logic:

1. The first paradox of implication. According to the standard definition of implication one
has "a --> (b --> a)" for all a and b. Every true statement is implied by anything whatsoever. For
instance, the statement that the moon is made of green cheese implies the statement that one plus
one equals two. The statement that Lassie is a dog implies the statement that Ione Skye is an

Get any book for free on: www.Abika.com

actress. This "paradox" follows naturally from the elegant classical definition of "a --> b" as
"either b, or else not a". But it renders the concept of implication inadequate for many purposes.

2. The second paradox of implication. For all a and c, one has "not-c --> (c --> a)". That is,
if c is false, then c implies anything whatsoever. From the statement that George Bush has red
hair, it follows that psychokinesis is real.

3. Contradiction sensitivity. In the second paradox of implication, set c equal to the
conjunction of some proposition and its opposite. Then one has the theorem that, if "A and not-
A" is true for any A, everything else is also true . This means that Boolean logic is incapable of
dealing with sets of data that contain even one contradiction. For instance, assume that "I love
my mother", and "I do not love my mother" are both true. Then one may prove that 2+2=5. For
surely "I love my mother" implies "I love my mother or 2+2=5" (in general, "a --> (a or b) ).
But, just as surely, "I do not love my mother" and "I love my mother or 2+2=5", taken together,
imply "2+2=5" (in general, [a and (not-a or b)] --> b). Boolean logic is a model of reasoning in
which ambivalence about one's feelings for one's mother leads naturally to the conclusion that

4. Hempel's confirmation paradox. According to Boolean logic, "all ravens are black" is
equivalent to "all nonblack entities are nonravens". That is,schematically, "(raven --> black) -->
(not-black --> not-raven)". This is a straightforward consequence of the standard definition of
implication. But is it not the case that, if A and B are equivalent hypotheses, evidence in favor of
B is evidence in favor of A. It follows that every observation of something which is not black
and also not a raven is evidence that ravens are black. This is patently absurd.

4.2.3. The Need for New Fundamental Notions

The standard method for dealing with these paradoxes has to acknowledge them, then dismiss
them as irrelevant. In recent years, however, this evasive tactic has grown less common. There
have been several attempts to modify standard Boolean-based formal logic in such a way as to
avoid these difficulties: relevant logics (Read, 1988), paraconsistent logics (daCosta, 1984), and
so forth.

Some of this work is of very high quality. But in a deeper conceptual sense, none of it is really
satisfactory. It is, unfortunately, not concrete enough to satisfy even the most logistically inclined
psychologist. There is a tremendous difference between a convoluted, abstract system jury-
rigged specifically to avoid certain formal problems, and a system with a simple intuitive logic
behind it.

An interesting commentary on this issue is provided by the following dialogue, reported by
Gian-Carlo Rota (1985). The great mathematician Stanislaw Ulam was preaching to Rota about
the importance of subjectivity and context in understanding meaning. Rota begged to differ (at
least partly in jest):

Get any book for free on: www.Abika.com

"But if what you say is right, what becomes of objectivity, an idea that is so definitively
formulated by mathematical logic and the theory of sets, on which you yourself have worked for
many years of your youth?"

Ulam answered with "visible emotion":

"Really? What makes you think that mathematical logic corresponds to the way we think? You
are suffering from what the French call a deformation professionelle. ..."

"Do you then propose that we give up mathematical logic?" said I, in fake amazement.

"Quite the opposite. Logic formalizes only very few of the processes by which we
actuallythink. The time has come to enrich formal logic by adding to it some other fundamental
notions. ... Do not lose your faith," concluded Stan. "A mighty fortress is mathematics. It will
rise to the challenge. It always has."

Ulam speaks of enriching formal logic "by adding to it some other fundamental notions."
More specifically, I suggest that we must enrich formal logic by adding to it the fundamental
notions of pattern and multilevel control, as discussed above. The remainder of this chapter is
devoted to explaining how, if one views logic in the context of pattern and multilevel control, all
four of the "paradoxes" listed above are either resolved or avoided.

This explanation clears the path for a certain form of limited Boolean logism -- a Boolean
logism that assigns at least a co-starring role to pattern and multilevel control. And indeed, in the
chapters to follow I will develop such a form of Boolean limited logism, by extending the
analysis of logic given in this chapter to more complex psychological systems: language and
belief systems.


Let us begin with the first paradox of implication. How is it that a true statement is implied by

This is not our intuitive notion of consequence. Suppose one mental process has a dozen
subsidiary mental processes, supplies them all withstatement A, and asks each of them to tell it
what follows from A. What if one of these subsidiary processes responds by outputting true
statements at random? Justified, according to Boolean logic -- but useless! The process should
not survive. What the controlling process needs to know is what one can use statement A for --
to know what follows from statement A in the sense that statement A is an integral part of its

This is a new interpretation of "implies." In this view, "A implies B" does not mean simply "-
B + A", it means that A is an integral part of a natural reasoning process leading towards B. It
means that A is helpful in arriving at B. Intuitively, it means that, when one sees that someone
has arrived at the conclusion B, it is plausible to assume that they arrived at A first andproceeded

Get any book for free on: www.Abika.com

to B from there. If one looks at implication this way -- structurally, algorithmically,
informationally -- then the paradoxes are gone.

In other words, according to the informational definition, A significantly implies B if it is
sensible to use A to get B. The mathematical properties of this definition have yet to be
thoroughly explored. However, it is clear that a true statement is no longer significantly implied
by everything: the first paradox of implication is gone.

And the second paradox of implication has also disappeared. A false statement no longer
implies everything, because the generic proof of B from "A and not-A" makes no essential use of
A; A could be replaced by anything whatsoever.

4.3.1. Informational Implication (*)

In common argument, when one says that one thing implies another, one means that, by a
series of logical reasonings, one can obtain the second thing from the first. But one does not
mean to include series of logical reasonings which make only inessential use of the first thing.
One means that, using the first thing in some substantial way, one may obtain the second through
logical reasoning. The question is, then, what does use mean?

If one considers only formulas involving --> (implication) and - (negation), it is possible to
say something interesting about this in a purely formal way. Let B1,...,Bn be a proof of B in the
deductive system T union {A}, where T is some theory. Then, one might define A to be used in
deriving Bi if either

1) Bi is identical with A, or

2) Bi is obtained, through an application of one of the rules of inference, from Bj's with j<i,
and A is used for deriving at least one of these Bj's.

But this simplistic approach becomes hopelessly confused when disjunction or conjunction
enters into the picture. And even in this uselessly simple case, it has certain conceptual
shortcomings. What if there is a virtually identical proof of A which makes no use of A? Then is
it not reasonable to say that the supposed "use" of A is largely, though not entirely, spurious?

It is not inconceivable that a reasonable approximation of the concept of use might be captured
by some complex manipulation of connectives. However, I contend that what use really has to
do with is structure. Talking about structure is not so cut-and-dried astalking about logical form
-- one always has a lot of loose parameters. But it makes much more intuitive sense.

Let GI,T,v(B) denote the set of all valid proofs of B, relative to some fixed "deductive system"
(I,T), of complexity less than v. An element of GI,T,v is a sequence of steps B0,B1,...,Bn-1, where
Bn=B, and for k>0 Bk follows from Bk-1 by one of the transformation rules T. Where Z is an
element of GI,T,v(B), let L(Z) = |B|/|Z|. This is a measure of how much it simplifies B to prove it
via Z.

Get any book for free on: www.Abika.com

Where GI,T,v(B) = {Z1,...,ZN}, and p is a positive integer, let

A = L(Z1)*[I(Z1|Y)]1/p + L(Z2)*[I(Z2|Y)]1/p + ... + L(ZN)*[I(ZN|Y)]1/p

B = I(Z1|Y)]1/p + I(Z2|Y)]1/p + ... + [I(ZN|Y)]1/p

Qp,v = A/B

Note that, since I(Zi|Y) is always a positive integer, as p tends to infinity, Qp,v tends toward the
value L(Z)*I(Z|Y), where Z is the element of GI,T,v that minimizes I(Z|N). The smaller p is, the
more fairly the value L(Z) corresponding to every element of GI,T,v is counted. The larger p is,
the more attention is focused on those proofs that are informationally close to Y. The idea is that
those proofs which are closer to Y should count much more than those which are not.

Definition: Let | | be a complexity measure (i.e., a nonnegative-real-valued function). Let
(I,T) be a deductive system, let p be a positive integer, and let 0<c<1. Then, relative to | |, (I,T), p
and c, we will say A significantly implies B to degree K, and write

A -->K B

if K = cL+(1-c)M is the largest of all numbers such that for some v there exists an element Y of
GI,T,v so that

1) A=B0 (in the sequence of deductions described by Y)

2) L = L(Y) = |B|/[|Y|],

3) M = 1/Qp,|Y|

According to this definition, A significantly implies B to a high degree if and only if B is an
integral part of a "natural" proof of A. The "naturalness" of the proof Y is guaranteed by clause
(3), which says that by modifying Y a little bit, it is not so easy to get a simpler proof. Roughly,
clause (3) says that Y is an "approximate local minimum" of simplicity, in proof space.

This is the kind of implication that is useful in building up a belief system. For, under
ordinaryimplication there can never be any sense in assuming that, since A --> Bi, i=1,2,...,N,
and the Bi are true, A might be worth assuming. After all, by contradiction sensitivity a false
statement implies everything. But things are not so simple under relevant implication. If a
statement A significantly implies a number of true statements, that means that by appending the
statement A to one's assumption set I, one can obtain quality proofs of a number of true
statements. If these true statements also happen to be useful, then from a practical point of view
it may be advisable to append A to I. Deductively such a move is not justified, but inductively it
is justified. This fits in with the general analysis of deduction given in SI, according to which
deduction is useful only insofar as induction justifies it.


Get any book for free on: www.Abika.com

Having dealt with implication, let us now turn to the paradox of contradiction sensitivity.
According to reasoning given above, if one uses propositional or predicate calculus to define the
transformation system T, one easily arrives at the following conclusion: if any two of the
propositions in I contradict each other, then D(I,T) is the entire set of all propositions. From one
contradiction, everything is derivable.

This property appears not to reflect actual human reasoning. A person may contradict herself
regarding abortion rights or the honesty of her husband or the ultimate meaning of life. And yet,
when she thinks about theoretical physics or parking her car, she may reason deductively to one
particular conclusion, finding any contradictory conclusion ridiculous.

In his Ph.D. dissertation, daCosta (1984) conceived the idea of a paraconsistent logic, one in
which a single contradiction in I does not imply everything. Others have extended this idea in
various ways. More recently, Avram (1990) has constructed a paraconsistent logic which
incorporates the idea of "relevance logic." Propositions are divided into classes and the inference
from A to A+B is allowed only when A and B are in the same class. The idea is very simple:
according to Avram, although we do use the "contradiction-sensitive" deductive system of
standard mathematical logic, we carefully distinguish deductions in one sphere from deductions
in another, so that we never, in practice, reason "A implies A orB", unless A and B are in the
same "sphere" or "category."

For instance, one might have one class for statements about physics, one for statements about
women, et cetera. The formation of A or B is allowed only if A and B belong to the same class.
A contradiction regarding one of these classes can therefore destroy only reasoning within that
class. So if one contradicted oneself when thinking about one's relations with one's wife, then
this might give one the ability to deduce any statement whatsoever about domestic relations --
but not about physics or car parking or philosophy.

The problem with this approach is its arbitrariness: why not one class for particle physics, one
for gravitation, one for solid-state physics, one for brunettes, one for blondes, one for
redheads,.... Why not, following Lakoff's (1987) famous analysis of aboriginal classification
systems, one category for women, fire and dangerous things?

Of course, it is true that we rarely make statements like "either the Einstein equation has a
unique solution under these initial-boundary conditions or that pretty redhead doesn't want
anything more to do with me." But still, partitioning is too rigid -- it's not quite right. It yields an
elegant formal system, but of course in any categorization there will be borderline cases, and it is
unacceptable to simply ignore these away.

The "partitioning" approach is not the only way of defining relevance formally. But it seems to
be the only definition with any psychological meaning. Read (1988), for instance, disavows
partitioning. But he has nothing of any practical use to put in its place. He mentions the classical
notion of variable sharing -- A and B are mutually relevant if they have variables in common.
But he admits that this concept is inadequate: for instance, "A" and "-A + B" will in general
share variables, but one wishes to forbid their combination in a single expression. He concludes
by defining entailment in such a way that

Get any book for free on: www.Abika.com

[T]he test of whether two propositions are logically relevant is whether either entails the other.
Hence, relevance cannot be picked out prior ... to establishing validity or entailment....

But the obvious problem is, this is not really a definition of relevance:

It may of course be objected that this suggested explication of relevance is entirely circular
andunilluminating, since it amounts to saying no more than that two propositions are logically
relevant if either entails the other....

Read's account of relevance is blatantly circular. Although it is not unilluminating from the
formal-logical point of view; it is of no psychological value.

4.4.1 Contradiction and the Structure of Mind

There is an an alternate approach: to define relevance not by a partition into classes but rather
in terms the theory of structure. It is hypothesized that a mind does not tend to form the
disjunction A or B unless the size

%[(St(A union v)-St(v)]-[St(B union w)-St(w)]%

is small for some (v,w), i.e. unless A and B are in some way closely related. In terms of the
structurally associative memory model, an entity A will generally be stored near those entities to
which it is closely related, and it will tend to interact mainly with these entities.

As to the possibility that, by chance, two completely unrelated entities will be combined in
some formula, say A or B, it is admitted that this could conceivably pose a danger to thought
processes. But the overall structure of mind dictates that a part of the mind which succumbed to
self-contradiction and the resulting inefficiency, would soon be ignored and dismantled.

According to the model of mind outlined above, each mental process supervises a number --
say a dozen -- of others. Suppose these dozen are reasoning deductively, and one of them falls
prey to an internal self-contradiction, and begins giving out random statements. Then how
efficient will that self-contradicting process be? It will be the least efficient of all, and it will
shortly be eliminated and replaced. Mind does not work by absolute guarantees, but rather by
probabilities, safeguards, redundancy and natural selection.

4.4.2. Contradiction and Implication

We have given one way of explaining why contradiction sensitivity need not be a problem
foractual minds. But, as an afterthought, it is worth briefly noting that one may also approach the
problem from the point of view of relevant implication. The step from " A and not-A" to B
involves the step "not-A --> A or B". What does our definition of significant implication say
about this? A moment's reflection reveals that, as noted above, clause (3) kicks in here: A is
totally indispensible to this proof of B; A could just as well be replaced by C, D, E or any other
proposition. The type of implication involved in contradiction sensitivity is not significant to a
very high degree.

Get any book for free on: www.Abika.com


Finally, what of Hempel's confirmation paradox? Why, although "all ravens are black" is
equivalent to "all non-black entities are non-ravens," is an observation of a blue chair a lousy
piece of evidence for "all ravens are black"?

My resolution is simple, and not conceptually original. Recall the "infon" notation introduced
in Section 2. Just because s |-- i //x to degree d, it is not necessarily the case that s |-- j //x to
degree d for every j equivalent to i under the rules of Boolean logic. This is, basically, all that
needs to be said. Case closed, end of story. Boolean logic is a tool. Only in certain cases does the
mind find it useful.

That the Boolean equivalence of i and j does not imply the equality of d(s,i,x) and d(s,j,x) is
apparent from the definition of degree given above. The degree to which (s,k,x) holds was
defined in terms of the intensity with which the elements of k are patterns in s, where complexity
is defined by s. Just because i and j are Booleanly equivalent, this does not imply that they will
have equal algorithmic information content, equal structure, equal complexity with respect to
some observer s. Setting things up in terms of pattern, one obtains a framework for studying
reasoning in which Hempel's paradox does not exist.

3.5.1 A More Psychological View

In case this seems too glib, let us explore the matter from a more psychological perspective.
Assume that "All ravens are black" happens to hold with degree d, in my experience, from my
perspective. Then to whatdegree does "All non-black entities are non-ravens" hold in my
experience, from my perspective?

"All ravens are black" is an aid in understanding the nature of the world. It is an aid in
identifying ravens. It is a significant pattern in my world that those things which are typically
referred to with the label "raven," are typically possessors of the color black. When storing in my
memory a set of experiences with ravens, I do not have to store with each experience the fact that
the raven in question was black -- I just have to store, once, the statement that all ravens are
black, and then connect this in my memory to the various experiences with ravens.

Now, what about "All non-black entities are non-ravens"? What good does it do me to
recognize this? How does it simplify my store of memories? It does not, not hardly at all. When I
call up a non-black entity from my memory, I will not need to be reminded that it is not a raven.
Why would I have thought that it was a raven in the first place? "Raven-ness?" is not one of the
questions which it is generally useful or interesting to ask about entities, whereas on the other
hand "color?" is one of the questions which it is often interesting to ask about physical objects
such as birds.

So, the real question with Hempel's paradox is, what determines the degree assigned to a given
proposition s |-- i //x. It is not purely the logical form of the proposition, but rather the degree to
which the proposition is useful to x, i.e. the emergence between the proposition and the other
entities which neighbor it in the memory of x. Degree is determined by psychological dynamics,

Get any book for free on: www.Abika.com

rather than Boolean logic. Formally, one may say: the logic of memory organization is what
determines the subjective complexity measure associated with x.

It is not always necessary to worry about where the degrees associated with propositions come
from. But when one is confronted with a paradox regarding degrees, then it is necessary to worry
about it. The real moral of Hempel's paradox, as I see it, is that one should study confirmation in
terms of the structure and dynamics of the mind doing the confirming. Studying confirmation
otherwise, "in the abstract," borders on meaningless.


. 2
( 10)