. 6
( 10)


I call this iteration the basic deterministic dynamic. It will serve as a "demonstration equation"
for talking about the properties of more complicated cognitive dynamics.

The idea underlying this equation is encapsulated in the following simple maxim: in a
cognitive system, time is the process of structure becoming substance. In other words, the
entities which make up the system now all act on one another, and thus produce a new collection
of entities which includes all the products of the interactions of entities currently existent. For
lack of a better term, I call this exhaustive collection of products the "Raw Potentiality" of the
system. Then, the system one moment later consists of the patterns in this collection, this Raw

8.2.1. A General Self-Generating Pattern Dynamic (*)

For every type of self-generating system, there is a corresponding type of self-generating
pattern dynamic. The basic deterministic dynamic is founded on the type of self-generating
system that is so totally "well-mixed" that everything interacts with everything else at each
time step. But in general, this is only the simplest kind of self-generating system: a self-
generating system may use any stochastically computable rule to transform what the Raw
Potentiality of time t into the reality of time t+1.

Furthermore, the basic deterministic dynamic assumes infinite pattern recognition skill; it is
anti-Godelian. In general, a self-generating system may use its Raw Potentiality in an incomplete
fashion. It need not select all possible patterns in the Raw Potentiality; it may pick and choose
which ones to retain, in a state-dependent way.

Formally, this means that one must consider iterations of the following form:

Systemt+1 = F [ Zt [St^( G[ R[Systemt] ])] ] (****)

where F and G are any stochastically computable functions, and Zt = Z[Systemt] is a "filtering
operator" which selects certain elements of

St^( G[ R[Systemt] ]]), based on the elements of Systemt.

Note that the function F cannot make any reference to Systemt; it must act on the level of
structure alone. This is why the function Zt is necessary. The particular system state Systemt can

Get any book for free on: www.Abika.com

affect the selection of which patterns to retain, but not the way these patterns are transformed. If
this distinction were destroyed, if F and Zt were allowed to blur together into a more general Ft =
F[Systemt], then the fundamental structure-dependence of the iteration would be significantly
weakened. One could even define Ft as a constant function on all values of St^( G[ R[Systemt] ]),
mapping into a future state depending only on Systemt. Thus, in essence, one would have (**)
back again.

Equation (****), like the basic deterministic dynamic (***), is merely (**) with a special form
of the transition operator T. T is now assumed to be a some sequence of operations, one of which
is a possibly filtered application of the relative structure operator St^. This is indeed a bizarre
type of dynamic -- instead of acting on real numbers or vectors, it acts on collections of
hyperrelations . However, it may still be studied using the basic concepts of dynamical systems
theory -- fixed points, limit cycles, attractors and so forth.

To see the profound utility of the filtering operator Zt, note that it may be defined specifically
to ensure that only those elements of St^(G[R[Systemt]]) which are actually computed by sub-
systems of Systemt are passed through to F and Systemt+1. In other words, one may set

Zt(X) = Z[Systemt](X) = X intersect R[Systemt]

Under this definition, (****) says loosely that Systemt+1 consists of the patterns which Systemt
has recognized in itself (and in the "compounds" formed by the interaction of its subsystems). It
may be rewritten as

Systemt+1 = F [ R[Systemt] intersect St^( G[ R[Systemt] ])] (*****)

This specialization brings abstract self-generating pattern dynamics down into the realm of
physical reality. For reasons that will be clear a little later, it is this equation that I will refer to as
the "cognitive equation" or "cognitive law of motion."

8.2.2. Summary

Self-generating pattern dynamics are dynamical iterations on collections of processes, and are
thus rather different from the numerical iterations of classical dynamical systems theory and
modern "chaos theory." However, it would be silly to think that one could understand mental
systems by the exact same methods used to analyze physical systems.

The basic modeling ideas of graph-theoretic structure and iterative dynamics are applicable to
both the mental and the physical worlds. But whereas in the physical domain one is concerned
mainly with numerical vectors , in the mental realm one is concerned more centrally with
processes. The two views are not logically contradictory: vectors may be modeled as processes,
and processes may be modeled as vectors. However, there is a huge conceptual difference
between the two approaches.

In non-technical language, what a "self-generating pattern dynamic" boils down to is the
following sequence of steps:

Get any book for free on: www.Abika.com

1) Take a collection of processes, and let each process act on all the other processes, in
whatever combinations it likes. Some of these "interactions" may result in nothing; others may
result in the creation of new processes. The totality of processes created in this way is called the
Raw Potentiality generated by the original collection of processes.

2) Transform these processes in some standard way. For instance, perhaps one wants to model
a situation in which each element of the Raw Potentiality has only a certain percentage chance
of being formed. Then the "transformation" of the Raw Potentiality takes the form of a selection
process: a small part of the Raw Potentiality is selected to be retained, and the rest is discarded.

3) Next, determine all the patterns in the collection of processes generated by Step 2. Recall
that patterns are themselves processes, so that what one has after this step is simply another
collection of processes.

4) "Filter out" some of the processes in the collection produced by Step 3. This filtering may
be system-dependent -- i.e., the original processes present in Step 1 may have a say in which
Step 3-generated pattern-processes are retained here. For instance, as will be suggested below, it
may often be desirable to retain only those patterns that are actually recognized by processes in
Step 1.

5) Transform the collection of processes produced by Step 4 in some standard way,
analogously to Step 2.

6) Take the set of processes produced by Step 5, and feed it back into Step 1, thus beginning
the whole process all over again.

This is a very general sequence of steps, and its actual behavior will depend quite sensitively
on the nature of the processes introduced in Step 1 on the firstgo-around, as well as on the nature
of the transformation and filtering operations. Modern science and mathematics have rather little
to say about this type of complex process dynamics. The general ideas of dynamical systems
theory are applicable, but the more specific and powerful tools are not. If one wishes to
understand the mind, however, this is the type of iteration which one must master.

More specifically, in order to model cognitive systems, a specific instance of the filtering
operation is particularly useful: one filters out all but those patterns that are actually recognized
by the components of the system. In other words, one takes the intersection of the products of
the system and the patterns in the system. The self-generating pattern dynamic induced by this
particular filtering operation is what I call the "cognitive equation."

Informally and in brief, one may describe the cognitive equation as follows:

1) Let all processes that are "connected" to one another act on one another.

2) Take all patterns that were recognized in other processes during Step (1), let these patterns
be the new set of processes, and return to Step (1)

Get any book for free on: www.Abika.com

An attractor for this dynamic is then a set of processes with the property that each element of
the set is a) produced by the set of processes, b) a pattern in the set of entities produced by the set
of processes. In the following sections I will argue that complex mental systems are attractors for
the cognitive equation.


According to chaos theory, the way to study a dynamical iteration is to look for its attractors.
What type of collection of processes would be an attractor for a self-generating pattern dynamic?

To begin with, let us restrict attention to the basic deterministic dynamic (***). According to
this iteration, come time t+1, the entities existent at time t are replaced by the patterns in the Raw
Potentiality generated by these entities. But this does not imply that all the entities from time t
completely vanish. That would be absurd -- the system would be a totally unpredictable chaos. It
is quite possible for some of the current entities to survive into the next moment.

If a certain entity survives, this means that, as well as being an element of the current system
Systemt, it is also a regularity in the Raw Potentiality of Systemt, i.e. an element of R[Systemt].
While at first glance this might seem like a difficult sort of thing to contrive, slightly more
careful consideration reveals that this is not the case at all.

As a simple example, consider two entities f and g, defined informally by

f(x) = the result of executing the command "Repeat x two times"

g(x) = the result of executing the command "Repeat x three times"

Then, when f acts on g, one obtains the "compound"

f(g) = the result of executing the command "Repeat x three times" the result of executing the
command "Repeat x three times"

And when g acts on f, one obtains the "compound"

g(f) = the result of executing the command "Repeat x two times" the result of executing the
command "Repeat x two times" the result of executing the command "Repeat x two times"

Now, obviously the pair (f,g) is a pattern in f(g), since it is easier to store f and g, and then apply
f to g, than it is to store f(g). And, in the same way, the pair (g,f) is a pattern in g(f). So f and g,
in a sense, perpetuate one another. According to the basic deterministic dynamic, if f and g are
both present in Systemt, then they will both be present in Systemt+1.

One may rephrase this example a little more formally by defining f(x) = x x, g(x) = x x x. In
set-theoretic terms, if one makes the default assumption that all variables are universally
quantified, this means that f has the form {x,{x,x x}} while g has the form {x,{x,x x x}}. So,
when f acts on g, we have the ugly-looking construction { {x,{x,x x x}}, {{x,{x,x x x}}, {x,{x,x

Get any book for free on: www.Abika.com

x x}} {x,{x,x x x}} }; and when g acts on f, we have the equally unsightly {{x,{x,x x}},
{{x,{x,x x}}, {x,{x,x x}} {x,{x,x x}} {x,{x,x x}}}. It is easy to see that, given this
formalization, the conclusions given in the text hold.

Note that this indefinite survival is fundamentally a synergetic effect between f and g. Suppose
that, at time t, one had a system consisting of only two entities, f and h, where

h = "cosmogonicallousockhamsteakomodopefiendoplamicreticulu mpenproleta riatti"

Then the effect of h acting on f would, by default, be

h(f) = empty set

And the effect of f acting on h would be

f(h) = "cosmogonicallousockhamsteakomodopefiendoplasmicreticulum


Now, (f,h) is certainly a pattern in f(h), so that, according to the basic deterministic dynamic, f
will be a member of Systemt+1. But h will not be a member of Systemt+1 -- it is not a pattern in
anything in R[Systemt]. So there is no guarantee that f will be continued to Systemt+2.

What is special about f and g is that they assist one another in producing entities in which they
are patterns. But, clearly, the set {f,g} is not unique in possessing this property. In general, one
may define a structural conspiracy as any collection of entities G so that every element of G is
a pattern in the Raw Potentiality of G. It is obvious from the basic deterministic dynamic that
one successful strategy for survival over time is to be part of a structural conspiracy.

Extending this idea to general deterministic equations of the form (****), a structural
conspiracy may be redefined as any collection P which is preserved by the dynamic involved,
i.e. by the mathematical operations R, G, St^ and F applied in sequence.

And finally, extending the concept to stochastic equations of form (****), a structural
conspiracy may be defined as a collection P which has a nonzero probability of being preserved
by the dynamic. The value of this probability might be called the "solidity" of the conspiracy.
Stochastic dynamics are interesting in that they have the potential to break down even solid
structural conspiracies.

One phrase which I use in my own thinking about self-generating pattern dynamics is "passing
through." For an entity, a pattern, to survive the iteration of the fundamental equation, it must
remain intact as a pattern after the process of universal interdefinition, universal interaction has
taken place. The formation of the Raw Potentiality is a sort of holistic melding of all entities with
all other entities. But all that survives from this cosmic muddle, at each instant, is the relative

Get any book for free on: www.Abika.com

structure. If an entity survives this process of melding and separation, then it has passed through
the whole and come out intact. Its integral relationship with the rest of the system is confirmed.

8.3.1. Conspiracy and Dynamics

What I have called a structural conspiracy is, in essence, a fixed point. It is therefore the
simplest kind of attractor which a self-generating pattern dynamic can have. One may also
conceive of self-generating-pattern-dynamic limit cycles -- collections P so that the presence of
P in Systemt implies the presence of P in Systemt+k, for some specific integer k>1.

Nietzsche's fanciful theory of the "eternal recurrence" may be interpreted as the postulation of
a universe-wide limit-cycle. His idea was that the system, with all its variation over time, is
inevitably repetitive, so that every moment which one experiences is guaranteed to occur again at
some point in the future.

And, pursuing the same line of thought a little farther, one may also consider the concept of a
self-generating-pattern-dynamical strange attractor. In this context, one may define a
"strange attractor" as a group P of entities which are "collectively fixed" under a certain dynamic
iteration, even though the iteration does not cycle through the elements of P in any periodic way.
Strange attractors may be approximated by limit cycles with very long and complicated
periodic paths.

In ordinary dynamical systems theory, strange attractors often possess the property of
unpredictability. That is, neither in theory nor in practice is there any way to tell which attractor
elements will pop up at which future times. Unpredictable strange attractors are called chaotic
attractors. But on the other hand, some strange attractors are statistically predictable, as in
Freeman's "strange attractor with wings" model of the sense of smell. Here chaos coexists with a
modicum of overlying order.

It is to be expected that self-generating pattern dynamical systems possess chaotic attractors,
as well as more orderly strange attractors. Furthermore, in ordinary dynamics, strange attractors
often contain fixed points; and so, in self-generating pattern dynamics, it seems likely that
strange structural conspiracies will contain ordinary structural conspiracies (although these
ordinary structural conspiracies may well be so unstable as to be irrelevant in practice).
However, there is at the present time no mathematical theory of direct use in exploring the
properties of self-generating pattern dynamical systems or any other kind of nontrivial self-
generating system. The tools for exploring these models simply do not exist; we must make them
up as we go along.

Fixed points are simple enough that one can locate them by simple calculation, or trained
intuition. But in classical dynamical systems theory, most strange attractors have been found
numerically, by computer simulation or data analysis. Only rarely has it been possible to verify
the presence of a strange attractor by formal mathematical means; and even in these cases, the
existence of the attractor was determined by computational means first. So it is to be expected
that the procedure for self-generating dynamics will be the same. By running simulations of

Get any book for free on: www.Abika.com

various self-generating systems, such as self-generating pattern dynamics, we will happen upon
significant strange attractors ... and follow them where they may lead.

8.3.2. Immunological Pattern Dynamics

The immune system, as argued at the end of Chapter Seven, is a self-generating component-
system. The cognitive equation leads us to the very intuitive notion that, even so, it is not quite a
cognitive system.

Insofar as the immune system is a self-maintaining network, the survival of an antibody type
is keyed to the ability of the type to recognize some other antibody type. If A recognizes B, then
this is to be viewed as B creating instances of A (indirectly, via the whole molecular system of
communication and reproduction). So the antibody types that survive are those which are
produced by other antibody types: the immune network is a self-generating component-system.

The next crucial observation is that the recognition involved here is a pattern-based operation.
From the fact that one specific antibody type recognizes another, then it follows only that there is
a significant amountof pattern emergent between the two antibody types; it does not follow that
the one antibody type is a pattern in the other. But the ensuing reproduction allows us to draw a
somewhat stronger conclusion. Consider: if type A attacks type B, thus stimulating the
production of more type A -- then what has happened? The original amounts of A and B, taken
together, have served as a process for generating a greater amount of A. Is this process a pattern
in the new A population? Only if one accepts that the type B destroyed was of "less complexity"
than the type A generated. For instance, if two A's were generated for each one B destroyed, then
this would seem clear. Thus, the conclusion: in at least some instances, antibody types can be
patterns in other antibody types. But this cannot be considered the rule. Therefore, the immune
system is not quite a fully cognitive system; it is a borderline case.

Or, to put it another way: the cognitive equation is an idealization, which may not be
completely accurate for any biologically-based system. But it models some systems better than
others. It models the immune system far better than the human heart or a piece of tree bark --
because the immune system has many "thought-like" properties. But, or so I will argue, it models
the brain even more adeptly.


I have said that mind is a self-generating system, and I have introduced a particular form of
self-generating system called a "self-generating pattern dynamic." Obviously these two ideas are
not unrelated. In this section I will make their connection explicit, by arguing that mind is a
structural conspiracy -- an attractor for a self-generating pattern dynamic.

More specifically, I will argue that a dual network is a kind of structural conspiracy. The key
to relating self-generating pattern dynamics with the dual network is the filtering operator Zt.

8.4.1. The Dual Network as a Structural Conspiracy

Get any book for free on: www.Abika.com

It is not hard to see that, with this filtering operation, an associative memory is almost a
structural conspiracy. For nearly everything in an associative memory is a pattern emergent
among other things in that associative memory. As in the case of multilevel control, there may be
a few odd men out -- "basic facts"being stored which are not patterns in anything. What is
required in order to make the whole memory network a structural conspiracy is that these "basic
facts" be generatable as a result of some element in memory acting on some other element.
These elements must exist by virtue of being patterns in other things -- but, as a side-effect, they
must be able to generate "basic facts" as well.

Next, is the perceptual-motor hierarchy a structural conspiracy? Again, not necessarily. A
process on level L may be generally expected to be a pattern in the products obtained by letting
processes on level L-1 act on processes from level L-2. After all, this is their purpose: to
recognize patterns in these products, and to create a pattern of success among these products.
But what about the bottom levels, which deal with immediate sense-data? If these are present in
Systemt, what is to guarantee they will continue into Systemt+1. And if these do not continue,
then under the force of self-generating pattern dynamics, the whole network will come crashing

The only solution is that the lower level processes must not only be patterns in sense data, they
must also be patterns in products formed by higher-level processes. In other words, we can only
see what we can make. This is not a novel idea; it is merely a reformulation of the central insight
of the Gestalt psychologists.

Technically, one way to achieve this would be for there to exist processes (say on level 3)
which invert the actions taken by their subordinates (say on level 2), thus giving back the
contents of level 1. This inversion, though, has to be part of a process which is itself a pattern in
level 2 (relative to some other mental process). None of this is inconceivable, but none of it is
obvious either. It is, ultimately, a testable prediction regarding the nature of the mind, produced
by equation (*****).

The bottom line is, it is quite possible to conceive of dual networks which are not structural
conspiracies. But on the other hand, it is not much more difficult, on a purely abstract level, to
envision dual networks which are. Equation (*****) goes beyond the dual network theory of
mind, but in an harmonious way. The prediction to which it leads is sufficiently dramatic to
deserve a name: the "producibility hypothesis." To within a high degree of approximation,
every mental process X which is not a pattern in some other mental process, can be
produced by applying some mental process Y to some mentalprocess Z, where Y and Z are
patterns in some other mental process.

This is a remarkable kind of "closure," a very strong sense in which the mind is a world all its
own. It is actually very similar to what Varela (1978) called "autopoesis" -- the only substantive
difference is that Varela believes autopoetic systems to be inherently non-computational in
nature. So far, psychology has had very little to say about this sort of self-organization and self-
production. However, the advent of modern complex systems science promises to change this

Get any book for free on: www.Abika.com

8.4.2. Physical Attractors and Process Attractors

All this is quite unorthodox and ambitious. Let me therefore pause to put it into a more
physicalistic perspective. The brain, like other extremely complex systems, is unpredictable on
the level of detail but roughly predictable on the level of structure. This means that the dynamics
of its physical variables display a strange attractor with a complex structure of "wings" or
"compartments." Each compartment represents a certain collection of states which give rise to
the same, or similar, patterns. Structural predictability means that each compartment has wider
doorways to some compartments than to others.

The complex compartment-structure of the strange attractor of the physical dynamics of the
brain determines the macroscopic dynamics of the brain. There would seem to be no way of
determining this compartment-structure based on numerical dynamical systems theory. Therefore
one must "leap up a level" and look at the dynamics of mental processes, perhaps represented by
interacting, inter-creating neural maps. The dynamics of these processes, it is suggested, possess
their own strange attractors called "structural conspiracies," representing collections of processes
which are closed under the operations of patter-recognition and interaction. Process-level
dynamics results in a compartmentalized attractor of states of the network of mental processes.

Each state of the network of mental processes represents a large number of possible
underlying physical states. Therefore process-level attractors take the form of coarser
structures, superimposed on physical-level attractors. If physical-level attractors are drawn in
ball-point pen, process-level attractors are drawn in magic marker. On the physical level, a
structural conspiracy represents a whole complex of compartments. But only the most densely
connected regions of the compartment-network of the physical-level attractor can correspond to
structural conspiracies.

Admittedly, this perspective on the mind is somewhat speculative, in the sense that it is not
closely tied to the current body of empirical data. However, it is in all branches of science
essential to look ahead of the data, in order to understand what sort of data is really worth
collecting. The ideas given here suggest that, if we wish to understand mind and brain, the most
important task ahead is to collect information regarding the compartment-structure of the strange
attractor of the brain, both on the physical level and the process level; and

above all to understand the complex relation between the strange attractors on these two different


I have proposed that the mind is an attractor for the cognitive equation. But this does not rule
out the possibility that some particular subsets of the mind may also be attractors for the
cognitive equation, in themselves. In particular, I suggest that linguistic systems tend to be
structural conspiracies.

This idea sheds new light on the very difficult psychological problem of language
acquisition. For in the context of the cognitive equation, language acquisition may be

Get any book for free on: www.Abika.com

understood as a process of iterative convergence toward an attractor. This perspective does
not solve all the micro-level puzzles of language acquisition theory -- no general, abstract theory
can do that. But it does give a new overarching framework for approaching the question of "how
language could possibly be learned."

8.5.1. The Bootstrapping Problem

The crucial puzzle of language acquisition theory is the "bootstrapping problem." What this
catch phrase means is: if all parts of language are defined in terms of other parts of language,
then where is the mind to start the learning process?

Consider the tremendous gap between the input and the output of the language learning
process. What a child is presented with are sentences heard in context. Gradually, the child's
mind learns to detect components and properties of these sentences: such things asindividual
words, word order, individual word meanings, intonation, stress, syllabic structure of words,
general meanings of sentences, pragmatic cues to interpretation, etc. All this is just a matter of
correlating things that occur together, and dividing things into natural groupings: difficult but
straightforward pattern recognition.

But what the child's mind eventually arrives at is so much more than this. It arrives at an
implicit understanding of grammatical categories and the rules for their syntactic interrelation.
So the problem is, how can a child determine the relative order of noun and verb without first
knowing what "nouns" and "verbs" are? But on the other hand, how can she learn to distinguish
nouns and verbs except by using cues from word order? Nouns do not have a unique position, a
unique intonation contour, a unique modifier or affix -- there is no way to distinguish them from
verbs based on non-syntactic pattern recognition.

The formal model of language given in Chapter Five makes the bootstrapping problem appear
even more severe. First of all, in the definition of "syntactic system," each word is defined as a
fuzzy set of functions acting on other words. How then are words to be learned, if each word
involves functions acting on other words? With what word could learning possibly start? Yes,
some very simple words can be partially represented as functions with null argument; but most
words need other words as arguments if they are to make any sense at all.

And, on a higher level of complexity, I have argued that syntax makes no sense without
semantics to guide it. No mind can use syntax to communicate unless it has a good
understanding of semantics; otherwise, among other problems, the paradoxes of Boolean logic
will emerge to louse things up. But on the other hand, semantics, in the pattern-theoretic view,
involves determining the set of all patterns associated with a given word or sentence. And the
bulk of these patterns involve words and more complex syntactic structures like phrases and
clauses: this is the systematicity of language.

No syntax without semantics, no semantics without syntax. One cannot recognize correlations
among syntactic patterns until one knows syntax to a fair degree. But until one has recognized

Get any book for free on: www.Abika.com

these correlations, one does not know semantics, and one consequently cannot use syntax for any
purpose. But how can one learn syntax at all, if one cannot use it for any purpose?

Chomsky-inspired parameter-setting theories circumvent this chicken-and-egg problem in a
way which iseither clever, obvious or absurd, depending on your point of view. They assume that
the brain has a genetically-programmed "language center," which contains an abstract version of
grammar called Universal Grammar or UG.

UG is understood to contain certain "switches" -- as a switch which determines whether nouns
come before or after verbs, a switch which determines whether plurals are formed by affixes or
by suffixes, and so on. The class of possible human syntaxes is the class of possible switch
settings for UG; and language learning is a process of determining how to set the switches for the
particular linguistic environment into which one has been born.

The parameter-setting approach simplifies the bootstrapping problem by maintaining that
syntaxes are not actually learned; they are merely selected from a pre-arranged array of
possibilities. It leaves only the much more manageable problem of semantic bootstrapping -- of
explaining how semantic knowledge is acquired by induction, and then combined with UG to
derive an appropriate syntax. Some theorists, however, consider the whole parameter-setting
approach to be a monumental cop-out. They stubbornly maintain that all linguistic knowledge
must be induced from experience. In other words, to use my earlier example, first the child gets
a vague idea of the concept of "noun" and "verb"; then, based on this vague idea, she arrives at a
vague idea of the relative positioning of nouns and verb. This inkling about positioning leads to a
slightly strengthened idea of "noun" and "verb" -- and so forth.

In general, according to this view, the child begins with very simple grammatical rules,
specific "substitution frames" with slots that are labeled with abstract object types; say "NOUN
VERB" or "NOUN go to NOUN" or "NOUN is very ADJECTIVE". Then, once these simple
frames are mastered, the child induces patterns among these substitution frames. "NOUN eats
NOUN," "NOUN kills NOUN," "NOUN tickles NOUN," etc., are generalized into NOUN
VERB NOUN. Next, more complex sentence structures are built up from simple substitution
frames, by induced transformational rules.

In the inductivist perspective, bootstrapping is understood as a difficult but not insurmountable
problem. It is assumed that the 1010 - 1012 neurons of the human brain are up to the task.
Parameter-setting theorists have a more pessimistic opinion of human intelligence. But the
trouble with the whole debate is that neitherside has a good overall concept of what kind of
learning is taken place.

In other words: if it's inductive learning, what kind of structure does the induction process
have? Or if it's parameter setting, what is the logic of the process by which these "parameters"
are learned -- how can this mechanistic model be squared with the messiness of human biology
and psychology? In short, what is the structure of linguistic intelligence? My goal in this
section is to suggest that the cognitive equation may provide some hints toward the resolution of
this conceptual difficulty.

Get any book for free on: www.Abika.com

8.5.2. Process-Network Theories of Language Learning

The dual network model suggests that language learning must be explicable on the level of
self-organizing, self-generating process dynamics. This is something of a radical idea, but on
the other hand, it can also be related with some of the "mainstream" research in language
acquisition theory. And, I will argue, it provides an elegant way of getting around the
bootstrapping problem. Constraint Satisfaction Models

Perhaps the most impressive among all parameter-setting theories is Pinker's (1987) constraint
satisfaction model. Initially Pinker wanted to model language learning using a connectionist
architecture a la Rumelhart and McClelland (1986). But this proved impossible; and indeed, all
subsequent attempts to apply simple "neural networks" to symbolic learning problems have been
equally fruitless.

So instead, Pinker borrowed from artificial intelligence the idea of a self-adjusting constraint
satisfaction network. The idea is that language acquisition results from the joint action of a
group of constraint satisfaction networks: one for assigning words to categories, one for
determining grammatical structures, one for understanding and forming intonations, etc.

Consider, for instance, the network concerned with grammatical structures. Each node of this
network consists of a rule prototype, a potential grammatical rule, which has its own opinion
regarding the role of each word in the sentence. The dynamics of the network is competitive. If
the sentence is "The dog bit the man," then one rule might categorize "The dog" as subjectand
"bit the man" as verb phrase; another might categorize "The dog bit" as subject and "the man" as
verb phrase. But if a certain rule prototype disagrees with the majority of its competitors
regarding the categorization of a word, then its "weight" is decreased, and its opinion is counted
less in the future.

The behavior of the network gets interesting when rules agree regarding some categorizations
and disagree regarding others. The weights of rules may fluctuate up and down wildly before
settling on an "equilibrium" level. But eventually, if the rule network is sufficiently coherent, an
"attractor" state will be reached.

If there were no initial knowledge, then this competitive process would be worthless. No
stable equilibrium would ever arise. But Pinker's idea is that the abstract rules supplied by UG,
combined with rudimentary rules learned by induction, are enough to ensure the convergence of
the network. This is a fancy and exciting version of the "parameter-setting" idea: parameters are
not being directly set, but rather UG abstractions are being used to guide the convergence of a
self-organizing process. Competition Models

An interesting counterpoint to Pinker's network model is provided by the evolutionary
approach of Bates and MacWhinney (1987). They present cross-linguistic data suggesting that

Get any book for free on: www.Abika.com

language learning is not a simple process of parameter-setting. Children learning different
languages will often differ in their early assumptions about grammar, as well as their ultimate
syntactic rule structures. Furthermore, the passage from early grammar to mature grammar may
be an oscillatory one, involving the apparent competition of conflicting tendencies. And
different children may, depending on their particular abilities, learn different aspects of the same
language at different times: one child may produce long sentences full of grammatical errors at
an early stage, while another child may first produce flawless short sentences, only then moving
on to long ones.

These observations disprove only the crudest of parameter-setting theories; they do not
contradict complex parameter-setting theories such as Pinker's constraint satisfaction network,
which integrates UG with inductive rule learning in a self-organizational setting. But they do
suggest that even this kind of sophisticatedparameter-setting is not quite sophisticated enough.
The single-level iteration of a constraint satisfaction network is a far cry from the flexible
multilevel iterations of the brain.

What Bates and MacWhinney propose is a sort of "two-level network" -- one level for forms
and another for functions. Form nodes may be connected to function nodes; for example, the
form of preverbal positioning in English is correlated with the function of expressing the actor
role. But there may also be intra-level connections: form nodes may be connected to other form
nodes, and function nodes to other function nodes.

In their view, mappings of a single form onto a single function are quite rare; much more
common is widely branching interconnection. For instance, they argue that

"subject" is neither a single symbol nor a unitary category. Rather, it is a coalition of many-to-
many mappings between the level of form (e.g. nominative case marking, preverbal position,
agreement with the verb in person and number) and the level of function (e.g. agent of a
transitive action, topic of an ongoing discourse, perspective of the speaker)....

Notice that the entries at the level of form include both "obligatory" or "defining" devices such
as subject-verb agreement, and "optional" correlates like the tendency for subjects to be marked
with definite articles. This is precisely what we mean when we argue that there is no sharp line
between obligatory rules and probabilistic tendencies.

Learning is then a process of modifying the weights of connections. Connections that lead to
unsatisfactory results have their weights decreased, and when there is a conflict between two
different nodes, the one whose connection is weighted highest will tend to prevail. Summary

Bates and MacWhinney, like Pinker, view language learning as largely a process of adjusting
the connections between various "processes" or "nodes." While this is not currently known
to be the correct approach to language acquisition, I submit that it is by far the most plausible
framework yet proposed. For Neural Darwinism teaches us that the brain is a networkof
interconnected processes, and that learning consists largely of the adjustment of the connections

Get any book for free on: www.Abika.com

between these processes. The process-network view of language acquisition fits quite neatly into
what we know about the brain and mind.

And the question "UG or not UG," when seen in this light, becomes rather less essential. What
is most important is the process dynamics of language learning. Only once this dynamics is
understood can we understand just how much initial information is required to yield the
construction of effective linguistic neural maps. Perhaps the inductivists are right, and abstract
cognitive abilities are sufficient; or perhaps Chomsky was correct about the necessity of pre-
arranged grammatical forms. But one's opinion on this issue cannot serve as the basis for a
theory of language acquisition. The process-network view relegates the innate-vs.-acquired
debate to the status of a side issue.

8.5.3. The Cognitive Equation and Language Learning

So, language learning is largely a process of adjusting the weights between different
processes. But how are these processes arrived at in the first place? Some of them, perhaps, are
supplied genetically. But many, probably most, are learned inductively, by pattern recognition.
This gives rise to the question of whether a language is perhaps a structural conspiracy.

The above discussion of "bootstrapping" suggests that this may indeed be the case. Parts of
speech like "nouns" and "verbs" are patterns among sentences; but they are only producible by
processes involving word order. On the other hand, rules of word ordering are patterns among
sentences, but they are only producible by processes involving parts of speech.

Bootstrapping states precisely that, once one knows most of the rules of syntax, it's not hard to
induce the rest. Suppose one assumes that the processes bearing the rules of language all

1) possess modest pattern-recognition capacities, and

2) are programmed to recognize patterns in sentences

Given this, it follows from the bootstrapping problem that any portion of a mind's linguistic
system is capable of producing the rest, according to the dynamics of the cognitive equation. In
other words, it follows that language is an attractor, a structural conspiracy.

And if one accepts this conclusion, then the next natural step is to view language learning as a
process of convergence to this attractor. This is merely a new way of conceptualizing the point
of view implicit in the work of Pinker, Bates, MacWhinney, and other process-network-oriented
acquisition theorists. These theorists have focused on the dynamics of already-existing networks
of linguistic rules; but as Pinker explicitly states, this focus is for sake of simplicity only (after
all, rule-bearing processes must come from somewhere ). The cognitive equation shifts the focus
from connection adjustment to process creation, but it does not alter the underlying process-
network philosophy.

The learning process starts with an initial collection of syntactic rules -- either simple
substitution rules picked up from experience, or randomly chosen specific cases of abstract UG

Get any book for free on: www.Abika.com

rules, or a combination of the two. Then each rule-bearing process recognizes patterns -- among
incoming and outgoing sentences and its companion processes.

This recognition process results in the production and comprehension of sentences, via its
interaction with outside perceptual and motor processes, and the associative memory network
(recall the intimate connection between syntax and semantics, discussed in Chapter Five). But
internally, it also leads to the creation of new processes ... which aid in the production and
comprehension of sentences, and in the creation of processes.

And this process is repeated until eventually nothing new is generated any more -- then an
attractor has been reached. Language, a self-sustaining mental system, has been learned.

Chapter Nine


I believe, so that I may understand

-- Saint Augustine

Believing is the primal beginning

even in every sense impression....

-- Friedrich Nietzsche

Are belief systems attractors? There is something quite intuitive about the idea . Before one
settles on a fixed system of beliefs, one's opinions regarding a certain issue may wander all over
the spectrum, following no apparent pattern. But once one arrives at a belief system regarding
that topic, one's opinions thereon are unlikely to vary from a narrow range.

But of course, if one is to declare that belief systems are attractors, one must specify: attractors
of what dynamical system? To say "attractors of brain dynamics" is obvious but inadequate: the
brain presents us with a system of billions or trillions of coupled nonlinear equations, which
current methods are incapable of analyzing even on a qualitative level. If belief systems are to be
usefully viewed as attractors, the relevant dynamical iteration must exist on a higher level than
that of individual neurons.

Get any book for free on: www.Abika.com

In the preceding chapters I have argued that, in order to make headway toward a real
understanding of themind, one must shift up from the neural level and consider the structure and
dynamics of interacting mental processes or neural maps (Edelman, 1988). Specifically, I have
proposed an equation for the evolution of mental processes, and I have suggested that
psychological systems may be viewed as subsets of the dual network which are strange
attractors of this equation. Now, in this chapter, I will begin the difficult task of relating these
formal ideas to real-world psychology -- to discuss the sense in which particular human belief
systems may be seen as subsystems of the dual network, and attractors of the cognitive equation.

After giving a simple formalization of the concept of "belief," I will consider the dynamics of
belief systems as displayed in the history of science, with an emphasis on Lakatos's structural
analysis of research programmes. Then I will turn to a completely different type of belief system:
the conspiracy theory of a paranoid personality. By constrasting these different sorts of belief
systems in the context of the dual network and the cognitive equation, a new understanding of
the nature of rationality will be proposed. It will be concluded that irrationality is a kind of
abstract dissociation -- a welcome conclusion in the light of recent work relating dissociation
with various types of mental illness (van der Kolk et al, 1991).

Personalities and their associated belief systems are notoriously vague and complicated. It
might seem futile to attempt to describe such phenomena with precise equations. But the Church-
Turing Thesis implies that one can model anything in terms of computational formulas -- if one
only chooses the right sort of formulas. My claim is that the "cognitive law of motion," applied
in the context of the dual network model, is adequate for describing the dynamics of mentality.
The theory of belief systems given in this chapter and the next is a partial substantiation of this


In this section I will give abstract, formal definitions for the concepts of "belief" and "belief
system." Though perhaps somewhat tedious, these definitions serve to tie in the idea of "belief"
with the formal vocabulary introduced in Chapters Two and Three;and they provide a solid
conceptual foundation for the more practical considerations of the following sections.

The basic idea is that a belief is a mental process which, in some regard, gives some other
mental process the "benefit of the doubt." Recall that, in Chapter Two, I defined an infon as a
fuzzy set of patterns. Suppose that a certain process X will place the process s in the associative
memory just as if s displayed infon i -- without even checking to see whether s really does
display i. Then I will say that X embodies the belief that s displays infon i. X gives s the benefit
of the doubt regarding i.

The mental utility of this sort of benefit-giving is obvious: the less processing spent on s, the
more available for other tasks. Mental resources are limited and must be efficiently budgeted.
But it is equally clear that minds must be very careful where to suspend their doubts.

Next, a test of a belief may be defined as a process with the potential to create an infon which,
if it were verified to be present, would decrease the intensity of the belief. In other words, a test

Get any book for free on: www.Abika.com

of a belief X regarding s has the potential to create an infon j which caused X to give s less
benefit of the doubt. Some beliefs are more testable than others; and some very valuable beliefs
are surprisingly difficult to test.

Finally, a belief system is a group of beliefs which mutually support one another, in the sense
that an increased degree of belief in one of the member beliefs will generally lead to increased
degrees of belief in most of the other member beliefs. The systematicity of belief makes testing
particularly difficult, because in judging the effect of infon j on belief X, one must consider the
indirect effects of j on X, via the effects of j on the other elements of the belief system. But,
unfortunately for hard-line rationalists, systematicity appears to be necessary for intelligence. It's
a messy world out there!

9.1.1. Formal Definition of Belief (*)

A belief, as I understand it, is a proposition of the form

" s |-- i with degree d"

or, in more felicitous notation,


In words, it is a proposition of the form "the collection of patterns labeled i is present in the
entity s with intensity d." To say that the individual x holds the belief (s,i;d), I will write

"s |-- i //x with degree d",

or, more compactly,


Mentally, such a proposition will be represented as a collection of processes which, when
presented with the entity s, will place s in the associative memory exactly as they would place
an entity which they had verified to contain patterns i with intensity d. A belief about s is a
process which is willing to give s the benefit of the doubt in certain regards. This definition is
simple and natural. It does not hint at the full psychological significance of belief; but for the
moment, it will serve us well.

Next, what does it mean to test a belief? I will say that an infon j is a test of a belief (s,i,x)
relative to the observer y, with certainty level e, to degree NM, where

N = the degree to which the observer y believes that the determination of the degree in d(s,j,x)
will cause a decrease in d(s,i,x).

M = the amount of effort which the observer y believes will be required to determine the
degree that s |-- j holds to within certainty e

Get any book for free on: www.Abika.com

I believe that this formal definition, awkward as it is, captures what one means when one
makes a statement like "That would be a test of Jane's belief in so and so." It is not an objective
definition, and it is not particularly profound, but neither is it vacuous: it serves its purpose well.

Factor N alone says that j is a test of i if y believes that determining whether j holds will affect
x's degree of belief that i holds. This is the essence of test. But it is not adequate in itself, because
j is not a useful test of i unless it is actually possible to determine the degree to which j holds.
This is the purpose of the factor M: it measures the practicality of executing the test j.

To see the need for M, consider the theory, well known among philosophers, that there is
some spot on the Earth's surface which has the property that anyone who stands there will see the
devil. The only test of this is to stand on every single spot on the earth's surface, which is either
impossible or impractically difficult, depending on the nature of space and time.

Or consider Galileo's belief that what one sees by pointing a telescope toward space is actually
"out there". Since at that time there was no other source of detailed information as to what was
"out there," there was no way to test this belief. Now we have sent men and probes into space,
and we have measured the properties of heavenly bodies with radio telescopy and other methods;
all these tests have supported Galileo's belief. But it is not hard to see why most of Galileo's
contemporaries thought his belief unreasonable.

The role of the "observer" y is simple enough. If one posits an outside, "impartial" observer
with access to all possible futures, then one can have an objective definition of test, which
measures the degree to which the presence of a certain infon really will alter the strength of a
belief. On the other hand, one may also consider the most "partial" observer of all: the belief-
holder. It is interesting to observe that, when a certain human belief system appears to be
strongly resistant to test, the belief-holders will generally acknowledge this fact just as readily as
outside observers.

9.1.2. Systematic Belief (*)

The formal definition of "belief system" is a little bit technical, but the basic idea is very
simple: a belief system is a collection of beliefs which are mutually supporting in that a test for
any one of them is a test for many of the others. It is permitted that evidence in favor of some of
the beliefs may be evidence against some of the others -- that what increases the intensity of
belief in A may decrease the intensity of belief in B, where both A and B are in the system. But
this must not be the rule -- the positive reinforcement must, on balance, outweigh the negative

To be precise, consider a set of beliefs {A1,...,An}. Let cij = cij(K;y) denote the amount of
increase in the degree to which Aj holds that, in the belief of y, will result from an increase by an
amount ofK in the degree to which Ai holds. Decrease is to be interpreted as negative increase,
so that if y believes that a decrease in the degree to which Aj holds will result from an increase
in the degree to which Ai holds by amount, then cij(K;y) will be negative. As with tests, unless
otherwise specified it should be assumed that y=x.

Get any book for free on: www.Abika.com

Then the coherence C({A1,...,An}) of the set {A1,...,An} may be defined as the sum over all i, j
and K of the cij. And the compatibility of a belief B with a set of beliefs {A1,...,An} may be
defined as C({A1,...,An,B}) -


The coherence of a set of beliefs is the degree to which the various member beliefs support
each other, on the average, in the course of the mental process of the entity containing the
beliefs. It is not the degree to which the various member beliefs "logically support" each other --
it depends on no system of evaluation besides that of the holder of the beliefs. If I think two
beliefs contradict each other, but in your mind they strongly reinforce eachother, then according
to the above definition the two beliefs may still be a strongly coherent belief system relative to
your mind. It follows that the "same" set of beliefs may form a different dynamical system in two
different minds.

Additionally, it is not necessary that two beliefs in the same mind always stand in the same
relation to each other there. If A1 contradicts A2 half the time, but supports A2 half the time with
about equal intensity, then the result will be a c12 near zero.

If none of the cij are negative, then the belief system is "consistent": none of the beliefs work
against eachother. Obviously, consistency implies coherence, though not a high degree of
coherence; but coherence does not imply consistency. If some of its component beliefs contradict
eachother, but others support eachother, then the coherence of a set of beliefs can still be high --
as long as the total amount of support exceeds the total amount of contradiction.

If a set of beliefs has negative coherence it might be said to be "incoherent." Clearly, an
incoherent set of beliefs does not deserve the title "belief system." Let us define a belief system
as a set of beliefs which has positive coherence.

The compatibility of a belief B with a belief system measures the expected amount by which
the addition of Bto the belief system would change the coherence of the belief system. If this
change would be positive, then B has positive compatibility; and if this change would be
negative, then B has negative compatibility -- it might be said to be incompatible.

Finally, it must be noted that a given human mind may contain two mutually incompatible
belief systems. This possibility reflects the fundamentally "dissociated" (McKellar, 1979) nature
of human mentality, whereby the mind can "split" into partially autonomous mental sub-
networks. The computation of the coefficients cij may be done with respect to any system one
desires -- be it a person's mind, a society, or one component of a person's mind.

9.1.3. Belief and Logic

How does a mind determine how much one belief supports another? In formal terms, how
does it determine the "correlation" function cij between belief i and belief j? Should an analysis
of belief merely accept these "intercorrelations" a priori, as given products of the believing mind

Get any book for free on: www.Abika.com

in question? Or is there some preferred "rational" method of computing the effect of a change in
the intensity of one belief on the intensity of another?

To see how very difficult these question are, assume for the sake of argument that all beliefs
are propositions in Boolean logic. Consider a significantly cross-referential belief system S --
one in which most beliefs refer to a number of other beliefs. Then, as William Poundstone (1989)
has pointed out, the problem of determining whether a new belief is logically consistent with the
belief system S is at least as hard as the well-known problem of "Boolean Satisfiability," or SAT.

Not only is there no known algorithm for solving SAT effectively within a reasonable amount
of time; it has been proved that SAT is NP-complete, which means (very roughly speaking) that
if there is such an algorithm, then there is also a reasonably rapid and effective algorithm for
solving any other problem in the class NP. And the class NP includes virtually every difficult
computational problem ever confronted in a practical situation.

So the problem of determining the consistency of a belief with a significantly cross-referential
belief system is about as difficult as any computational problemyet confronted in any real
situation. To get a vague idea of how hard this is, consider the fact that, using the best algorithms
known, and a computer the size of the known universe with processing elements ths size of
protons, each working for the entire estimated lifetime of the universe, as fast as the laws of
physics allow, it would not be possible to determine the logical consistency of a belief with a
significantly cross-referential belief system containing six hundred beliefs.

It must be emphasized that the problem of making a good guess as to whether or not a belief
is logically consistent with a given belief system is an entirely different matter. What is so
astoundingly difficult is getting the exact right answer every time. If one allows oneself a certain
proportion of errors, one may well be able to arrive at an answer with reasonable rapidity.
Obviously, the rapidity decreases with the proportion of error permitted; the rate of this
decrease, however, is a difficult mathematical question.

So when a mind determines the functions cij relating its beliefs, it may take logical consistency
into account, but it seems extremely unlikely that it can do so with perfect accuracy, for three
reasons: 1) based on experience, the human mind does not appear to be terribly logically
consistent; 2) the brain is not an exact mechanism like a computer, and it almost certainly works
according to rough probabilistic approximation methods; 3) the problem of determining logical
consistency is NP-complete and it is hence very unlikely that it has a rapid, accurate solution for
any but the smallest belief systems.

Hence it is unreasonable to require that a system of beliefs be "rational" in structure, at least if
rationality is defined in terms of propositional logic. And the structural modifications to
propositional logic suggested in Chapter Four only serve to make the problem of determining the
cij even more difficult. In order to compute anything using the structural definition of
implication, one has to compute the algorithmic information contained in various sequences,
which is impossible in general and difficult in most particular cases.

Get any book for free on: www.Abika.com

From these considerations one may conclude that the determination of the functions cij -- of
the structure of a belief system -- is so difficult that the mind must confront it with a rough,
approximate method. In particular, I propose that the mind confronts it with a combination of
deduction, induction and analogy: that itdoes indeed seek to enforce logical consistency, but
lacking an effective general means of doing so, it looks for inconsistency wherever experience
tells it inconsistency is most likely to lurk.


No mind consists of fragmentary beliefs, supported or refuted by testing on an individual
basis. In reality, belief is almost always systematic. To illustrate this, let us consider some
philosophically interesting examples from the history of science.

In his famous book, The Structure of Scientific Revolutions, Thomas Kuhn (1962) proposed
that science evolves according to a highly discontinuous process consisting of 1) long periods of
"normal science," in which the prevailing scientific belief system remains unchanged, and new
beliefs are accepted or rejected largely on the basis of their compatibility with this belief system,
and 2) rare, sudden "paradigm changes," in which the old belief system is replaced with a new

According to this analysis, the historical tendency of scientists has been to conform to the
prevailing belief system until there suddenly emerges a common belief that the process of testing
has yielded results which cannot possibly be made compatible with the old system. This point of
revolution is called a "crisis." Classic examples of scientific revolution are the switch from
Newtonian mechanics to relativity and quantum theory, and the switch from Ptolemaic to
Copernican cosmology. This phenomenon is clearest in physics, but it is visible everywhere.

Kuhn never said much about how belief systems work; he placed the burden of explanation on
sociology. Imre Lakatos (1978) was much more specific. He hypothesized that science is
organized into belief systems called "research programmes," each of which consists of a "hard
core" of essential beliefs and a "periphery" of beliefs which serves as a medium between the hard
core and the context. According to this point of view, if A is a belief on the periphery of a
research programme, and a test is done which decreases its intensity significantly, then A is
replaced with an alternate belief A' which is, though incompatible with A and perhaps other
peripheralbeliefs, still compatible with the hard core of the programme.

Admittedly, the distinction between "hard core" and "periphery" is much clearer in retrospect
that at the time a theory is being developed. In reality, the presence of a troublesome piece of
data often leads to much debate as to what is peripheral and what is essential. Nonetheless,
Lakatosian analysis can be quite penetrating.

For instance, consider the Ptolemaic research programme, the analysis of the motions of
heavenly bodies in terms of circular paths. One could argue that the "hard core" here contains the
belief that the circle is the basic unit of heavenly motion, and the belief that the earth is the center
of the universe; whereas initially the periphery contained, among other things, the belief that the
heavenly bodies revolve around the earth in circular paths.

Get any book for free on: www.Abika.com

When testing refuted the latter belief, it was rejected and replaced with another belief that was
also compatible with the hard core: the belief that the heavenly bodies move in "epicycles,"
circles around circles around the earth. And when testing refuted this, it was rejected and
replaced with the belief that the heavenly bodies move in circles around circles around circles
around the earth -- and so on, several more times. Data was accomodated, but the hard core was
not touched.

Consider next the Copernican theory, that the planets revolve in circles around the sun. This
retains part but not all of the hard core of the Ptolemaic belief system, and it generates a new
periphery. In Copernicus's time, it was not clear why, if the earth moved, everything on its
surface didn't fly off. There were certain vague theories in this regard, but not until around the
time of Newton was there a convincing explanation. These vague, dilemma-ridden theories
epitomize Lakatos's concept of periphery.

Philosophers of science have a number of different explanations of the transition from
Ptolemaic to Copernican cosmology. It was not that the Copernican belief system explained the
data much better than its predecessor; in fact, it has been argued that, when the two are restricted
to the same number of parameters, their explanatory power is approximately equal (Feyerabend,
1970). It was not that there was a sociological "crisis" in the scientific community; therewas
merely a conceptual crisis, which is visible only in retrospect. Extant documents reveal no
awareness of crisis.

Was it that the Copernican theory was "simpler"? True, a single circle for each planet seems
far simpler than a hierarchy of circles within circles within circles within circles.... But the
complexity of the Ptolemaic epicycles is rivalled by the complexity of contemporaneous
explanations as to how the earth can move yet the objects on its surface not be blown away. As
Feyerabend has rightly concluded, there is no single explanation for this change of belief system;
however, detailed historical analysis can yield insight into the complex processes involved.

9.2.1. Belief Generation

Lakatos's ideas can easily be integrated into the above-given model of belief systems. The first
step is a simple one: belief in an element of the hard core strongly encourages belief in the other
theories of the system, and belief in a theory of the system almost never discourages belief in an
element of the hard core. There are many ways to formalize this intuition; for example, given an
integer p and a number a, one might define the hard core of a belief system {A1,...,An} as the set
of Ai for which the p'th-power average over all j of cij exceeds a. This says that the hard core is
composed of those beliefs which many other beliefs depend on.

But unfortunately, this sort of characterization of the hard core is not entirely adequate. What
it fails to capture is the way the hard core of a research programme not only supports but actually
generates peripheral theories. For instance, the hard core of Newtonian mechanics -- the three
laws of motion, and the machinery of differential and integral calculus -- is astoundingly adept at
producing analyses of particular physical phenomena. One need merely make a few incorrect
simplifying assumptions -- say, neglect air resistance, assume the bottom of a river is flat,
assume the mass within the sun is evenly distributed, etc. -- and then one has a useful peripheral

Get any book for free on: www.Abika.com

theory. And when theperipheral theory is refuted, this merely indicates that another "plausible"
incorrect assumption is needed.

There is an old story about a farmer who hires an applied mathematician to help him optimize
his productivity. The mathematician begins "First, let us assume a spherical cow...," and the
farmer fires him. The farmer thinks the mathematician is off his rocker, but all the mathematician
is doing is applying a common peripheral element of his belief system. This peripheral element,
though absurd in the context of the farmer's belief system, is often quite effective when
interpreted in terms of the belief system of modern science. The peripheral theory seems
ridiculous "in itself", but it was invented by the hard core for a certain purpose and it serves this
purpose well.

For a different kind of example, recall what Newtonian mechanics tells us about the solar
system: a single planet orbiting the sun, assuming that both are spherical with uniform density,
should move in an ellipse. But in fact, the orbit of Mercury deviates from ellipticity by
approximately 43 seconds of arc every century.

This fact can be accomodated within the framework of Newtonian mechanics, for instance by
changing the plausible simplifying assumption of uniform spherical mass distribution -- a step
which leads to all sorts of interesting, peripheral mathematical theories. In fact, when all known
data is taken into account, Newtonian mechanics does predict a precession, just a smaller
precession than is observed. So it is easy to suppose that, with more accurate data, the exact
amount of precession could be predicted.

But eventually, General Relativity came along and predicted the exact amount of the
precession of Mercury's orbit "from first principles," assuming a uniform, spherical sun. Now the
precession of Mercury's orbit is seen as a result of the way mass curves space -- a notion entirely
foreign to Newtonian physics. But that's another story. The point, for now, is that the hard core
of a theory can suggest or create peripheral theories as well as supporting them.

And indeed, it is hard to see how a belief system could survive sustained experimental attack
unless some of its component beliefs came equipped with significant generative power. If a
belief system is to defend itself when one of its beliefs is attacked, it must be able to generate
compatible new beliefs to take the place of theold. These generative elements will be helpful to
the system over the long term only if they are unlikely to be refuted -- and an element is least
likely to be refuted if it is strongly supported by other elements of the system. Therefore, systems
with generative hard cores are the "hardiest" systems; the most likely to preserve themselves in
the face of experimental onslaught.

The idea of a "generative hard core" may be formalized in many different ways; however, the
most natural course is to avail ourselves of the theory of self-generating component systems
developed in Chapters Seven and Eight. In other words, I am suggesting that a scientific belief
system, like a linguistic system, is a self-generating structured transformation system.
Belief systems combine these two important system-theoretic structures to form something new,
something with dazzling synergetic properties not contain in either structures on its own.

Get any book for free on: www.Abika.com

Structured transformation systems unite deduction and analogy in a striking way, via the
connection between grammar and semantics which continuous compositionality enforces. Self-
generating systems provide an incredible power for unpredictable, self-organizing creativity.
Putting the two together, one obtains, at least in the best case, an adaptable, sturdy tool for
exploring the world: adaptable because of the STS part, and sturdy because of the self-
generation. This is exactly what the difficult task of science requires.

9.2.2. Conclusion

In the history of science one has a record of the dynamics of belief systems -- a record which,
to some extent, brings otherwise obscure mental processes out into the open. It is clear that, in
the history of science, belief has been tremendously systematic. Consistently, beliefs have been
discarded, maintained or created with an eye toward compatibility with the generative "hard
cores" of dominant belief systems. I suggest -- and this is hardly a radical contention -- that this
process is not specific to scientific belief, but is rather a general property of thought.

I have suggested that scientific belief systems are self-generating structured transformation
systems. In the following sections I will make this suggestion yet more specific: I will propose
that all belief systems are not only self-generating structured transformationsystems but also
attractors for the cognitive equation.

But in fact, this is almost implicit in what I have said so far. For consider: beliefs in a system
support one another, by definition, but how does this support take place on the level of
psychological dynamics? By far the easiest way for beliefs to support one another is for them to
produce one another. But what do the processes in the dual network produce but patterns. Thus
a belief system emerges as a collection of mental processes which is closed under generation
and pattern recognition -- an attractor for the cognitive equation.

What Lakatos's model implies is that belief systems are attractors with a special kind of
structure: a two-level structure, with hard core separate from periphery. But if one replaces the
rigid hard core vs. periphery dichotomy with a gradation of importance, from most central to
most peripheral, then one obtains nothing besides a dual network structure for belief systems.
The hard core is the highest-level processes, the outermost periphery are the lowest-level.
Processes are grouped hierarchically for effective production and application; and heterarchically
for effective associative reference.

In this way, a belief system emerges as a sort of "mini mind," complete in itself both
structurally and dynamically. And one arrives at an enchanting conceptual paradox: only by
attaining the ability to survive separately from the rest of the mind, can a belief system make
itself of significant use to the rest of the mind. This conclusion will return in Chapter Twelve,
equipped with further bells and whistles.


I have discussed some of the most outstanding belief systems ever created by the human mind:
Newtonian mechanics, Galilean astronomy, general relativity. Let us now consider a less

Get any book for free on: www.Abika.com

admirable system of beliefs: the conspiracy theory of a woman, known to the author, suffering
from paranoid delusion. As I am a mathematician and not a clinical psychologist, I am not
pretending to offer a "diagnosis" of the woman possessing this belief system. My goal is merely
to broaden our conceptual horizons regarding the nature of psychodynamics, by giving a specific
example to back upthe theoretical abstractions of the cognitive equation and the dual network.

9.3.1. Jane's Conspiratorial Belief System

"Jane" almost never eats because she believes that "all her food" has been poisoned. She has a
history of bulimia, and she has lost twenty-five pounds in the last month and a half; she is now
5'1'' and eighty five pounds. She believes that any food she buys in a store or a restaurant, or
receives at the home of a friend, has been poisoned; and when asked who is doing the poisoning,
she generally either doesn't answer or says, accusingly, " You know!" She has recurrent leg
pains, which she ascribes to food poisoning.

Furthermore, she believes that the same people who are poisoning her food are following her
everywhere she goes, even across distances of thousands of miles. When asked how she can tell
that people are following her, she either says "I'm not stupid!" or explains that they give her
subtle hints such as wearing the same color clothing as her. When she sees someone wearing the
same color clothing as she is, she often assumes the person is a "follower," and sometimes
confronts the person angrily. She has recently had a number of serious problems with the
administration of the college which she attends, and she believes that this was due to the
influence of the same people who are poisoning her food and following her.

To give a partial list, she believes that this conspiracy involves: 1) a self-help group that she
joined several years ago, when attending a college in a different part of the country, for help with
her eating problems; 2) professors at this school, from which she was suspended, and which she
subsequently left; 3) one of her good friends from high school.

Her belief system is impressively resistant to test. If you suggest that perhaps food makes her
feel ill because her long-term and short-term eating problems have altered her digestive system
for the worse, she concludes that you must be either stupid or part of the conspiracy. If you
remind her that five years ago doctors warned her that her leg problem would get worse unless
she stopped running and otherwise putting extreme pressure on it, and suggest that perhaps her
leg would be better if she stopped working as a dancer, she concludes that you must be either
stupid or part of the conspiracy. If yousuggest that her problems at school may have partly been
due to the fact that she was convinced that people were conspiring against her, and consequently
acted toward them in a hostile manner -- she concludes that you must be either stupid or part of
the conspiracy.

9.3.2. Jane and the Cognitive Equation

I have analyzed the structure of Jane's conspiracy theory; now how does this relate to the
"cognitive equation of motion" given in Chapter Eight. Recall that this equation, in it simplest
incarnation, says roughly the following:

Get any book for free on: www.Abika.com

1) Let all processes that are "connected" to one another act on one another.

2) Take all patterns that were recognized in other processes during Step (1), let these patterns
be the new set of processes, and return to Step (1).

An attractor for this dynamic is then a set of processes X with the property that each element
of the set is a) produced by the interaction of some elements of X, b) a pattern in the set of
entities produced by the interactions of the elements of X.

In order to show that Jane's belief system is an attractor for this dynamic, it suffices to show
that each element of the belief system is a pattern among other elements of the system, and is
potentially producible by other elements of the system. Consider, for instance, the seven beliefs

C0: There is a group conspiring against me

C1: My food is poisoned by the conspiracy

C2: My friends and co-workers are part of the conspiracy

C3: My leg pain is caused by the conspiracy

C4: My food tastes bad

C5: My friends and co-workers are being unpleasant to me

C6: My leg is in extreme pain

In the following discussion, it will be implicitly assumed that each of these beliefs is stored
redundantly in the brain; that each one is contained in a number of different "neural maps" or
"mental processes." Thus, when it is said that C0, C1, C2 and C6 "combine to produce" C3, this
should be interpreted to mean that a certain percentage of the time , when these four belief-
processes come together, the belief-process C3 is the result.

Furthermore, it must be remembered that each of the brief statements listed above next to the
labels Ci is only a shorthand way of referring to what is in reality a diverse collection of ideas
and events. For instance, the statement "my co-workers are being unpleasant to me" is
shorthand for a conglomoration of memories of unpleasantness. Different processes
encapsulating C5 may focus on different specific memories.

Without further ado, then, let us begin at the beginning. Obviously, the belief C0 is a pattern
among the three beliefs which follow it. So, suppose that each of the mental processes
corresponding to C1, C2 and C3 is equipped with a generalization routine of the form "When
encountering enough other beliefs that contain a certain sufficiently large component in common
with me, create a process stating that this component often occurs." If this is the case, then C0
may also be created by the cooperative action of C1, C2 and C3, or some binary subset thereof.

Get any book for free on: www.Abika.com

One might wonder why the process corresponding to, say, C1 should contain a generalization
routine of this type. The only answer is that such routines are of general utility in intelligent
systems, and that they add only negligible complexity to a process such as C1 which deals with
such formidable concepts as "food" and "conspiracy." In a self-organizing model of the mind,
one may not assume that recognitive capacity is contained in a single "generalization center"; it
must be achieved in a highly distributed way. Production of Particular Conspiracies

Next, what about C1? Taking C0, C2, C3 and C4 as given, C1 is a fairly natural inference.
Suppose the process corresponding to C0 contains a probabilistic generalization routine of the
form "The greater the number of events that have been determined to be caused by conspiracy,
the more likely it is that event X is caused by conspiracy." Then when C0 combines with C2 and
C3, it will have located two events determined to be caused by conspiracy. And when this
compound encounters C4, the generalization capacity of C0 will be likely to lead to the creation
of a belief such as C1.

So C1 is produced by the cooperative action of these four beliefs. In what sense is it a pattern
in the other beliefs? It is a pattern because it simplifies the long list of events that are
summarized in the simplestatement "My food is being poisoned." This statement encapsulates a
large number of different instances of apparent food poisoning, each with its own list of plausible
explanations. Given that the concept of a conspiracy is already there, the attribution of the
poisoning to the conspiracy provides a tremendous simplification; instead of a list of hypotheses
regarding who did what, there is only the single explanation " They did it." Note that for


. 6
( 10)