José E. Burgos (2002) Behavioral Knowledge and Structural Complexity in Mcculloch-pitts Systems. Psycoloquy: 13(026) Behavioral Knowledge (1)

Volume: 13 (next, prev) Issue: 026 (next, prev) Article: 1 (next prev first) Alternate versions: ASCII Summary
PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).
Psycoloquy 13(026): Behavioral Knowledge and Structural Complexity in Mcculloch-pitts Systems

Target Article on Behavioral Knowledge

José E. Burgos
University of Guadalajara
Centro de Estudios e Investigaciones en Comportamiento
Av. 12 de Diciembre
#204 Col. Chapalita
C.P. 45030
(A.P. 5-374)
Guadalajara, Jalisco,

413 Interamericana Blvd
WH1 PMB 30-189
Laredo, TX


I present a thought experiment in biobehavioral epistemology. A McCulloch-Pitts (MP) system faces the task of attaining some behavioral knowledge about another MP system, where "knowledge" means "classifying" and "behavior" means "input-output" relations. An analysis of two cases, where the observing system was structurally as complex as the observed system, revealed that the former could not achieve a complete, maximally fine-grained knowledge about the latter's behavior. I examine some hypothetical generalizations of this outcome.


behavioral knowledge, complexity, McCulloch, Pitts, epistemology
    The target article below was today published in PSYCOLOQUY, a
    refereed journal of Open Peer Commentary sponsored by the American
    Psychological Association. Qualified professional biobehavioral,
    neural or cognitive scientists are hereby invited to submit Open
    Peer Commentary on it. Please email or consult the websites below
    for Instructions if you are not familiar with format or acceptance
    criteria for PSYCOLOQUY commentaries (all submissions are

    To submit articles and commentaries or to seek information:



1. How does the structural complexity of a system, relative to that of another system of the same kind, affect the extent to which the former can attain behavioral knowledge about the latter? This is the general question that motivates the present paper. It is a question that has a direct epistemological relevance for biobehavioral science (BBS). Indeed, BBS can be viewed as the (admittedly fuzzy) set of subdisciplines that are dedicated to the experimental, theoretical, analytic, and/or synthetic study of the ontogeny and/or phylogeny of the structure and/or function of nervous systems at the molecular, cellular, anatomical, and/or whole-organism behavioral levels. As any other science, BBS raises the epistemological issue of the possibility and extent of knowledge about its subject matter (biobehavioral systems). A naturalistic approach to this issue emphasizes the fact that BBS largely consists in the occurrence of a certain kind of activity that is performed by a certain kind of individuals (viz., biobehavioral scientists) who attempt to achieve some knowledge about a wide variety of biobehavioral systems, human as well as nonhuman. Under this characterization, which I adopt in the present paper, BBS is viewed as arising from the biology and behavior of certain biobehavioral systems that face the task of achieving some knowledge about other biobehavioral systems. The general issue with which I am concerned here is the possibility and extent of knowledge BY biobehavioral systems ABOUT biobehavioral systems. In this sense, the present paper can be seen as an exercise in biobehavioral epistemology of BBS.

2. Alas, real BBS, as an ongoing constellation of biobehavioral processes, consists of a web of infra- and supra-structures, factors, agents, products, and events that are intertwined in intricate relations. This situation is largely due to the fact that BBS arises from HUMAN biology and behavior, which are often considered to be among the most complex in nature. Such a complexity makes the issue practically unmanageable for systematic investigation, thus hindering an effective search for clear and precise answers to my initial question. My purpose in the present paper, then, is to apply the simplifying device of the THOUGHT EXPERIMENT, in order to simulate a kind of situation that is roughly analogous to the one in which I am interested here, but simple enough to allow for rigorous investigation.

3. The experiment, of course, will be quite unrealistic, for it will involve a great deal of simplification. However, "[t]hought experiments are not supposed to be realistic. They are supposed to clarify our thinking about reality" (Dawkins, 1982, p. 4). On this basis, my main objective is to address the issue in a sufficiently clear and precise manner as to allow for rigorous conclusions. Such conclusions thus would provide solid starting points for equally systematic further research, whose results could then motivate yet more systematic research, and so on, increasingly approaching real BBS. To be sure, approaching real BBS in such a piecemeal manner will take a considerable amount of time and effort. However, a loss in swiftness will be compensated by a gain in clarity, precision, and depth, a tradeoff that, I believe, will be favorable in the long run.

4. The centerpiece of the paper is the thought experiment in question. For this purpose, I shall use the McCulloch-Pitts (MP) theory (McCulloch & Pitts, 1943), based on considerations of formal simplicity, clarity, and precision, which make it ideal as a STARTING POINT for the kind of analysis I propose. This choice implies that MP systems qualify as biobehavioral systems, which, in turn, implies that neurocomputational, connectionist, subsymbolic, or neural-network concepts, methods, and theories are part of BBS. Whether or not artificial symbolic systems (e.g., Turing machines) also qualify as 'biobehavioral' systems is an issue I shall not address. However, there is nothing in the present kind of analysis that in principle prohibits an extension to symbolic systems.

5. In the next section, I review the MP theory briefly and define the 'behavior' of an MP system in terms of its input-output universe. In Section III, I stipulate how I shall use the term 'knowledge' in reference to the behavior of MP systems. In Section IV, I stipulate a comparative criterion for determining relative structural complexity in feedforward neural networks. In section V, I describe the thought experiment. In the last section, I discuss three issues for further reflection.


6. The MP theory describes the structure and functioning of a neural kind of processing unit or element, which I will also call 'MP'. Figure 1 depicts a generic MP element. Informally, the element consists of a finite number 'S' of input units, sensors, receptors, or "peripheral afferents" (as McCulloch and Pitts called them). The sensors (small empty circles labeled as '1',...,'S') detect input signals (labeled as 'a(1)',..., 'a(S)') from the element's local environment and carry them over to a computing junction (large circle labeled as 'j'). This junction then performs certain computations (v and phi; see below) and returns an output signal a(j), which represents the element's activation state. In accordance to the All-or-None Law, the MP theory assumes that all activation states are binary, that is, sensors and processing elements can be in one and only one state at any moment t, either 1 (active) or 0 (inactive). Whether or not j is active at t depends on how many of its sensors are active, how strongly they are connected to the element, and the value of theta (the element's threshold). In the figure, connections are represented by small filled circles. The strength of a connection is typically given by a numerical value w that determines the weight of an input signal a(i) in contributing to the element's activation a(j). Connections in MP systems are one-directional; hence the notation 'w(i,j)' to denote the weight of the connection from i to j.

Figure 1: A generic McCulloch-Pitts (MP) element.

7. An element's functioning is described mathematically, in terms of a rule for computing its state at t, as a function of its input signals and weights. This rule is a threshold function that specifies the conditions under which an element should or should not be active. According to standard descriptions, the element is active if the linear combination of input signals and their corresponding weights reaches or exceeds a given threshold theta. Otherwise, the element is inactive. That is:

If v >= theta, then a(j,t) = 1, else a(j,t) = 0 (1).

where 'v' denotes the linear combination of input signals and their corresponding weights (in Figure 1, v is defined as the inner product of vector A of input activations and vector W of connection weights).

8. By the BEHAVIOR of an MP system, I mean what an MP system COULD IN PRINCIPLE DO, were it exposed to all of the possible distinct input patterns it can detect sensorially. The behavior of an MP system thus is given by its INPUT-OUTPUT (I-O) UNIVERSE (or 'performance space'), which can be defined set-theoretically as the set of all I-O RELATIONS or MAPPINGS the system can implement or realize in principle. An I-O relation can be set-theoretically defined (by extension) as a set of ordered tuples of the form (a(1),...,a(S),a(S+H+1),...,a(S+H+O)), where a(1),...,a(S) is the vector of input signals being received by the system simultaneously at a t, S is the number of 'specialized' sensors (see later), H is the number of hidden elements, O is the number of output elements, and a(S+H+1),...,a(S+H+O) is the vector of output activations (the element's 'response' to the input pattern), where each activation is computed according to Equation 1.

9. The size of an MP system's I-O universe depends on four factors. First, an MP system with S sensors can be stimulated in a maximum of 2S different ways, each one representing a distinct input pattern given at t. Second, an MP system with O output elements can respond in 2O distinct ways to any given input pattern given at t. Third, viewed in the time domain, an MP system may respond differentially to sets of sequences of all the possible input patterns, depending on the weights. Hence, the maximum possible size of the I-O universe of such a system would be 2(2*S*O) I-O relations. Fourth, MP systems with no hidden layers cannot realize linearly inseparable I-O mappings, a well-known limitation of these systems (Minsky & Papert, 1988). If linearly inseparable tasks are taken into account, then the size of the I-O universe of such systems becomes [2(2*S*O)]-n, where n is the number of linearly inseparable I-O mappings, which is known to increase exponentially with S*O (Windner, 1961).

10. As an example of the present definition of 'behavior-in-MP-systems' ('behavior', hereafter), consider an MP system with two sensors (S=2) directly connected to a single output element (O=1). Such a system can be stimulated in 2S = 4 distinct ways, constituting possible input patterns that would be given to the element as temporal sequences of four trials. Also, the system can respond in 2O=2 distinct ways to any one of those patterns. Each trial and the system's response to it would be an I-O occurrence, which can be represented as a triplet consisting of a pair of activations of the element's sensors (a possible input pattern), followed by the system's response to it. Such an occurrence would thus be a member of an I-O relation, among a total of 2(2*S*O) = 16 logically possible relations.

11. As is well known, sets of weights exist that will allow the system to realize most of those relations. For instance, the system can realize the mapping {(1,1,1),(1,0,0),(0,1,1),(0,0,1)}, known in propositional calculus as 'material implication'. Each triplet (enclosed in parentheses) represents an I-O occurrence. The first and second members of each triplet represent a possible input pattern, while the third member represents the element's response to that pattern. The system can also realize other 13 of its 16 logically possible mappings. It is also well known that only the mappings {(1,1,0),(1,0,1),(0,1,1),(0,0,0)} ('xor') and {(1,1,1),(1,0,0),(0,1,0),(0,0,1)} ('biconditional') are linearly inseparable and, hence, unrealizable by the system. Its I-O universe (its behavior) thus can be defined as the set of the 14 realizable I-O mappings (the 2(2*S*O) = 16 logically possible mappings, minus the n=2 linearly inseparable ones).

12. The term 'behavior', of course, has multiple nonequivalent meanings (uses, definitions, or whatever), 'I-O universe' being only one of them. I have chosen this meaning because it is reasonably clear, adequate for MP systems, and compatible with at least some other meanings. My exclusion of other meanings here is only a simplifying, conceptual delimitation device, with no pretension whatsoever of exhaustiveness, sufficiency, or metaphysical supremacy. I thus regard other meanings as being equally legitimate. However, taking them into account would lead to scenarios and analyses that, in the interest of space, are better left for further work. Yet, my choice of the term 'behavior' is not arbitrary, for I want my proposal to have at least some conceptual relevance for contexts where other meanings of 'behavior' are adopted. Such relevance will obtain insofar as I-O relations constitute at least an aspect of such meanings. The present analysis thus will relate conceptually only to those contexts where different uses of 'behavior' are FUNDAMENTALLY COMPATIBLE with the present one.

13. The theoretical and practical importance of MP systems lies in their ability to CLASSIFY or DISCRIMINATE among (or RESPOND DIFFERENTIALLY to) different kinds of input patterns. This ability has been exploited in a number of engineering applications, where networks of MP elements (augmented with a learning rule; see Rosenblatt, 1958) face various kinds of tasks, such as image processing, pattern recognition, and control. But what about an MP system that faces the task of classifying the behavior of another MP system? This possibility leads to the core of the present proposal.


14. The proposal revolves around the following kind of scenario. Two MP systems are assumed to be in a situation where one is designated as the 'observing' or 'knowing' system, the other as the 'observed' system, and the former faces the task of attaining some knowledge about the latter's behavior, that is, some BEHAVIORAL knowledge [1]. The epistemological import of this scenario arises from the presence of the term 'knowledge'. Whether or not such a presence in this context is legitimate, raises a host of philosophical problems I cannot address here. Instead, I will only make a few preliminary considerations in that regard and stipulate a definition that is suitable for my present purposes.

15. To begin with, a concern about epistemological issues is not foreign to neural-network theorizing. For example, in the last section of their seminal paper, McCulloch and Pitts stated: "Our knowledge of the world, including ourselves, is incomplete as to space and indefinite as to time. This ignorance, implicit in all our brains, is the counterpart of the abstraction which renders our knowledge useful. The role of our brains in determining the epistemic relations of our theories to our observations and of these to the facts is all too clear, for it is apparent that every idea and every sensation is realized by activity within the net, and by no such activity are the actual afferents fully determined. ... With determination of the net, the unknowable object of knowledge, the 'thing in itself', ceases to be unknowable." (1943, pp. 35-36).

16. Similarly, Pitts and McCulloch (1947) proposed neural mechanisms for pattern recognition, as an account of how our brains are capable of "knowing universals" (p. 46). McCulloch also expressed that his "interest always was to reduce epistemology to an experimental science", coined the term "experimental epistemology" (1961, p. 1), and talked about a "nervous theory of knowledge" (1964, p. 362). So, biobehavioral interpretations of epistemological issues were clearly a central concern of early connectionist theorizing efforts. Such interpretations remain in more recent work (e.g., P. M. Churchland, 1989; P. S. Churchland, 1986). Like other efforts, the present one involves a naturalistic biobehavioral approach to epistemology. All efforts so far, however, have focused on non-biobehavioral events, systems, and processes as objects of knowledge for biobehavioral systems. In contrast, the present effort focuses on the possibility and extent of knowledge about biobehavioral systems of a certain kind BY biobehavioral systems of the same kind.

17. To be sure, any true believer in a naturalistic reconstruction of epistemology (regardless of his/her particular orientation) is ULTIMATELY interested in the nature, extent, and possibility of HUMAN knowledge. Presumably, then, the final goal of any naturalistic epistemology programme is to build a HUMAN EPISTEMOLOGY. However, it is also clear that such an endeavor cannot be accomplished in a direct fashion, largely due to the extreme complexity of human biology and behavior, which poses formidable ethical, conceptual, methodological, and theoretical challenges. Consequently, a naturalistic human epistemology must inevitably result from a convoluted path of multiple lines of inquiry involving a host of simplifying assumptions, concepts, methods, and theories constituting relatively delimited conglomerates of approaches. In the present paper, I concentrate on the connectionist or neurocomputational approach. My purpose, however, is not to defend or attack that or any other particular programme. Rather, I will use it to derive interesting conclusions that could be generalized to other approaches and, to that extent, encourage more a inclusive discussion.

18. The indirect character of the construction of a naturalistic human epistemology forces us to use terms like 'knowledge', 'knowing' (I shall use the two forms interchangeably), and 'epistemology' in reference to NONHUMAN systems, insofar as that process involves considering them as objects of study (even if temporary ones). A substantial portion of such a construction thus involves, effectively, building a NONHUMAN epistemology. True believers in naturalistic epistemology hope that this epistemology (as astray as it can get us from our original goal) will eventually lead to a human epistemology. In the meantime, we must allow a great deal of semantic flexibility, so that those terms can be used in reference to nonhuman systems, even artificial ones. Otherwise, the construction of a nonhuman epistemology, as a means towards a human epistemology, will be seriously hindered. Of course, the meanings of those terms will be very different when used in reference to human and nonhuman, perhaps even different kinds of nonhuman, systems. Consequently, a crucial part of the process of returning to human knowledge must involve progressive conceptual shifts that allow our uses of those terms to increasingly approximate meanings that are more consistent with those typically found in human epistemology.

19. In what sense, then, can we say that a biobehavioral system 'knows' anything? Exactly what can such a system 'know'? Can it know 'everything there is to know'? What does 'everything there is to know' mean? These are epistemological questions. Hence, the epistemological import of the present proposal. Answers to these questions, of course, depend on the meaning of the term 'knowledge', which, in turn, depends on the kind of biobehavioral system of interest. One difficulty is that 'knowledge' (like 'behavior' and 'life') is a strongly polysemic and, hence, largely nontechnical term. Consequently, to obtain a single, unified meaning that encompasses all (or even a fair fraction) of the meanings of 'knowledge' in an internally consistent manner is extremely difficult (if not impossible). Most likely, different theories will provide different definitions. I will thus adopt a meaning that is reasonably clear, adequate for MP systems, and compatible with other meanings. The meaning in question arises from the standard distinction between EXPLICIT knowledge and IMPLICIT (or 'performance') knowledge. This is a rather difficult distinction, since it forces the issue of consciousness, so I will use it here in its most general and intuitive form. Explicit knowledge is typically regarded as involving some performance in a certain task with self-awareness of it. Implicit knowledge is typically viewed as involving some performance without any self-awareness of it. The notion of implicit knowledge seems more appropriate for MP systems, so I shall adopt it here.

20. I construe knowledge in MP systems as PERFORMING IN A CERTAIN MANNER UNDER CERTAIN CONDITIONS. The conditions amount to the occurrence of instances of the different kinds of input patterns that constitute some prespecified domain. The performance amounts to CLASSIFYING such patterns. By 'classifying' I mean 'responding differentially to' or 'discriminating among' instances of some domain. More precisely, in the language of set theory, 'classifying' amounts to PARTITIONING, that is, dividing a domain into non-empty proper subsets whose intersections are empty and whose union is the domain. What I said in Paragraph 12 about the definition of 'behavior', also applies to the present definition of 'knowledge' as 'classifying'. The present proposal thus will be conceptually relevant to other epistemological contexts insofar as the present use of 'knowledge' is fundamentally compatible with others.

21. Under the present definition, then, an MP system will be regarded as being capable of knowing about a given domain if the system is capable of classifying at least some of the instances of that domain. Set-theoretically, I will regard an MP system as being capable of knowing about a given domain if the system can achieve at least one PARTITION of at least a proper subset of the domain in question. According to this definition, 'knowledge' in an MP system is conceptualized as a sort of I-O relation shown by that system (i.e., a partitioning relation). Under this definition, then, knowing in an MP system involves not only detecting certain input patterns SENSORIALLY, but also classifying at least some of them. Sensory detection of input patterns is necessary for knowing about them, so the latter will obviously be impossible without the former. However, sensory detection is not sufficient. Under the present definition, knowledge in an MP system involves sensory detection AND differential responding to whatever the system detects sensorially.

22. Since knowing in MP systems is defined as behaving in a certain way, the former concept inevitably arises from the latter. In fact, both concepts overlap considerably. After all, MP systems are CLASSIFIER systems, so definitions of 'behavior' and 'knowledge' that are reasonably adequate for such systems must be closely related. Under the present definitions, then, knowing in MP systems is behaving in a certain manner. The two concepts, however, are not identical. On the one hand, knowledge in an MP system can be set-theoretically defined as that subset consisting of all the mappings of its I-O universe that are partitions of the domain. On the other hand, the I-O universe of any MP system with no recurrent connections will always contain two mappings that qualify as behavior but not as knowledge. In propositional calculus, such mappings are known as 'tautology' and 'contradiction'. Consequently, knowledge is a PROPER subset of behavior, for which the two sets are not identical. Under the present definitions, then, knowledge and behavior are not coextensive. Although all knowledge is behavior, not all behavior is knowledge.

23. For example, consider again an MP system with two sensors that are directly connected to a single processing element. The two mappings in question would be set-theoretically defined (by extension) as {(1,1,1),(1,0,1),(0,1,1),(0,0,1)} ('tautology') and {(1,1,0),(1,0,0),(0,1,0),(0,0,0)} ('contradiction'). Under the present definition, the other 12 I-O mappings that are also realizable by that system qualify as knowledge, since all of them involve a partition of the domain of interest (the set of possible input patterns). Under this definition, then, ALMOST (but not quite) everything an MP system does, qualifies as 'knowledge'. This definition of 'knowledge-in-MP-systems' is admittedly encompassing. However, I prefer to be as unrestrictive as possible on what qualifies as knowing in an MP system (although restrictive enough to distinguish it from 'behaving'), in order to avoid outcomes that depend too much on restrictive definitions.

24. Knowledge about a certain domain can be more or less COMPLETE, and more or less COARSE (or DETAILED or REFINED). The completeness dimension is given by whether the system can in principle classify the entire domain (knowing completely) or only proper subsets of it (knowing partially). The coarseness dimension is given by whether the system can in principle respond differentially to instances that constitute singletons of the domain (maximally fine knowledge) or to instances that constitute more inclusive proper subsets (coarse knowledge). For instance, consider, once again, an MP system with two sensors directly connected to a single processing element. In this system, all of the 12 mappings that qualify as 'knowing' also qualify as 'knowing completely', for all of them involve a partition of the entire set of input patterns (the domain to be classified). The only ways this system can know partially would be because it has no sensory access to all the input patterns and/or is somehow impeded from responding to some of them. In any case, none of those mappings qualifies as maximally fine knowledge, for the system can respond in only two different ways, while the domain consists of four distinct input patterns. The system in question, then, can only know coarsely.

25. Let us now apply the above definition of 'knowledge' to define a concept of 'behavioral knowledge' in MP systems. Under the present definitions, it is clear that behavioral knowledge is behavior, insofar as knowledge is behavior. However, the domain to be classified in behavioral knowledge is the behavior of another MP system. Behavioral knowledge thus is behaving with respect to behavior. Under this characterization, behavioral knowledge involves, on the one hand, a partial semantics of input patterns and, on the other, a distinction between two roles any MP system can play, namely, 'observed' and 'observing', where the concept of behavioral knowledge would be applied only to the role of 'observing'. The semantics arises from interpreting some of the input signals that an MP system receives as being the OUTPUT signals of another MP system. The distinction between the two roles arises from regarding any MP system that receives such signals as effectively 'observing' another system's behavior, which would play the role of 'observed' [2]. What I said about the definitions of 'behavior' (Paragraph 12) and 'knowledge' (end of Paragraph 20) also applies to the present distinction between 'observing' and 'observed'.

26. For an MP system to be able to attain SOME knowledge about the behavior of another MP system, the former must be capable not only of detecting the input AND output aspects of at least some of the I-O relations that constitute the latter's I-O universe, but also responding differentially to such relations. In this manner, knowing about behavior by an observing MP system involves I-O mappings. The input aspect of these mappings, in turn, is given by a set of I-O mappings that constitute the performance space of an observed MP system. Regarding completeness and coarseness, I shall consider an observing MP system as being capable of knowing completely about the behavior of another MP system if the former system can classify ALL of the mappings that constitute the observed system's I-O universe. If, for whatever reason, the observing system can classify only a proper subset of the observed system's I-O universe, then the former system will only be capable of knowing the latter system's behavior partially. The observing system will be incapable of knowing anything whatsoever (i.e., will remain completely ignorant) about the observed system's behavior if the former cannot classify any of the mappings that constitute the latter's I-O universe. The observing system will be capable of maximally fine behavioral knowledge about the observed system if the former is capable of classifying distinct I-O relations. A coarse behavioral knowledge is given by a classification of aggregates of the observed system's I-O relations.

27. The above definitions result in five possibilities regarding how completely and finely an observing MP system can know about the behavior of another MP system: No knowledge (no partition at all of the observed system's I-O universe), partial coarse knowledge (coarse partitions of a proper subset of that universe), partial maximally fine knowledge (maximally fine partition of a proper subset of the I-O universe), complete coarse knowledge (coarse partitions of the entire I-O universe), and complete maximally fine knowledge (maximally fine partition of the entire I-O universe).


28. As anticipated in my initial question, the general issue in which I am interested here is whether and exactly how the complexity of an observing system, relative to that of an observed system, affects the extent to which the former can attain behavioral knowledge about the latter. Such an issue places the present proposal in the general realm of complexity theory, specifically within neural-network complexity theory (e.g., Parberry, 1998), given my choice of MP systems. However, my intention here is not to theorize about neural-network complexity, let alone defend or criticize particular approaches within this area. I am only interested in formulating a criterion that is COMPREHENSIVE, INTERNALLY CONSISTENT, and sufficiently CLEAR AND PRECISE as to allow for unequivocal decisions about whether an MP system is more or less complex than, or as complex as another MP system. Therefore, I will not worry about the biological plausibility of the criterion (although such a concern will eventually arise as the present analysis is extended to other kinds of biobehavioral systems).

29. Since there is no universally accepted metric for neural-network complexity, and due to the considerable difficulties involved in building an adequate metric, I shall formulate a COMPARATIVE (as opposed to quantitative) criterion. Again, what I said about the definitions of 'behavior' (Paragraph 12) and 'knowledge' (end of Paragraph 20) also applies to the ensuing definition of 'relative structural complexity'. The present proposal thus will be conceptually relevant to other complexity contexts insofar as the ensuing use of 'complexity' is fundamentally compatible with other uses (viz., information, randomness, computation, entropy, edge of chaos, etc.).

30. I will focus on STRUCTURAL (as opposed to functional) complexity. It is generally agreed that the structural complexity of a feedforward neural network (like an MP system) depends on its four CONSTITUTIVE ('architectural' or 'anatomical') features, namely, its total number of specialized sensors (S), processing elements (E), hidden layers of processing elements (L), and connections (C) [3]. Let F be any one of these features, and F(A) and F(B) be the value(s) of that feature for any two given MP systems A and B. Clearly, either F(A)=F(B), F(A)>F(B), or F(A)<F(B) will be the case for any F, A, and B, where each kind of relation can hold for a minimum of zero and a maximum of four features. On this basis, I stipulate that A is structurally as complex as B if F(A)=F(B) for all of the features, or F(A)>F(B) holds for an equal number of features as F(A)<F(B). I shall regard A as being structurally more complex than B whenever F(A)>F(B) holds for more features than F(A)<F(B).

31. Table 1 summarizes the outcomes of applying the criterion to all the possible frequency distributions for the three possible comparative relations F(A)=F(B), F(A)>F(B), and F(A)<F(B) and features (the labels D1, D2,... do not represent any theoretically meaningful order). Of the 15 possible distributions, three result in equal complexity (D1, D4, and D13), and the remaining 12 are evenly divided between A and B. Each distribution defines a class of infinitely many scenarios that are suitable for exploration. The present proposal, thus, is largely programmatic. As a starting point, I will examine two cases of D1.

Table 1: Frequency distributions for the three possible comparative relations between the structural features of two MP systems A and B.

    Dist.  F(A)=F(B)  F(A)>F(B)  F(A)<F(B)  More complex system
     D1        4          0          0             None
     D2        3          0          1               B
     D3        3          1          0               A
     D4        2          1          1             None
     D5        2          0          2               B
     D6        2          2          0               A
     D7        1          0          3               B
     D8        1          1          2               B
     D9        1          2          1               A
    D10        1          3          0               A
    D11        0          4          0               A
    D12        0          3          1               A
    D13        0          2          2             None
    D14        0          1          3               B
    D15        0          0          4               B


32. In a first case, the observing systems and the observed systems are instances of the simplest MP system possible, that is, a single sensor directly connected to a single output element. I shall adopt two assumptions regarding sensors. First, every sensor is MAXIMALLY FUNCTIONAL (i.e., there are no inert sensors, as it were). I thus assume that any given MP system inhabits an environment that has some stimulation source for each one of the system's sensors. Second, I assume that sensors are MAXIMALLY SPECIALIZED, so that any sensor can detect one and only one and the same kind of signal across time within any given scenario.

33. Any classifying task for MP systems has an input and an output aspect. To begin with the input aspect, any observing system in this case can detect one and only one kind of input signal. Under the assumption of maximal specialization of sensors, then, an observing system in the present kind of scenario will only be able to detect either the input or the output signals that constitute the observed system's I-O mappings. Consequently, no observing system in this case will be able to detect an I-O occurrence. This situation is depicted in Figure 2, where the observed system is labeled as 'MP[O1]', and two alternative observing systems are labeled as 'MP[K1]' (upper panel) and 'MP[K2]' (lower panel) [4]. The only difference between MP[K1] and MP[K2] is the type of signal each system can detect. MP[K1]'s sensor (open circle) is specialized in detecting a(1) (the same signal that MP[O1] can detect), while MP[K2]'s sensor (open square) is specialized in detecting a(2), which is MP[O1]'s output activation in response to a(1).

Figure 2: A case of D1 (see Table 1).$

34. MP[K1] can detect a(1), but not a(2), while the converse is the case for MP[K2]. Both systems miss a critical aspect of MP[O1]'s behavior (either MP[O1]'s output, in the case of MP[K1], or MP[O1]'s input signal, in the case of MP[K2]). None of them thus will be able to detect any I-O occurrence from MP[O1]. Hence, they will not be able to achieve any partition of MP[O1]'s I-O universe or a proper subset of it. In terms of the definitions given in Section III, both observing systems will remain completely ignorant about the observed system's behavior. In order to attain even a partial behavioral knowledge about MP[O1], MP[K1] and MP[K2] would have to have at least one more sensor.

35. Now consider the output aspect of the task. Let MP[O1]'s I-O universe be the set


which consists of a total of 22S = 4 different I-O relations, all of them computationally realizable by (linearly separable for) MP[O1]. This is the domain to be classified by MP[K1] and MP[K2]. Each relation consists of two ordered pairs enclosed in inner curly brackets. Each pair is enclosed in parentheses and represents an I-O occurrence at a moment in time. The first member of a pair represents a possible input signal, while the second member represents a possible output activation given by MP[O1] in response to that signal, according to Equation 1. This universe arises from giving MP[O1] the same sequence of possible input signals repeatedly across time (note that the first member of each pair is the same from one I-O relation to the other), and then specifying MP[O1]'s possible responses for each sequence. Each relation involves two successive trials (one in which MP[O1]'s sensor is active and one in which it is inactive). Therefore, eight trials are required for either MP[K1] or MP[K2] to be completely exposed to that universe. Beyond these trials, repeated instances of the same mappings will occur.

36. To qualify as a partition of MP[O1]'s I-O universe (or a proper subset of it), the observing systems' output signals must divide that universe (or a proper subset of it) into at least two nonoverlapping nonempty equivalence classes whose set-theoretic union exhausts the universe (or a proper subset of it). Since MP[K1] and MP[K2] have only one output element, each one can respond in only two different ways. Consequently, they would only be able to achieve a coarse behavioral knowledge about MP[O1], even if they had the input capacity to detect instances of MP[O1]'s I-O universe sensorially (i.e., two sensors instead of just one). A maximally fine knowledge would require MP[K1] and MP[K2] to respond in four different ways, for MP[O1]'s I-O universe consists of four distinct mappings. To achieve this task, MP[K1] and MP[K2] would need to have at least one more output element.

37. As another example, consider Figure 3, which depicts another case of the same kind of scenario (D1 in Table 1), this time with an observed system (MP[O2]) that is more complex than MP[O1] by having one more sensor, but as complex as two other possible observing systems (MP[K3] and MP[K4]). Each system in this case can detect two and only two kinds of input signals. If, again, on the input side, we assume that MP[O2]'s sensors are fully functional and maximally specialized, then we must assume the presence of two external sources of stimulation that can be detected by MP[O2]. The sensors of MP[K3] and MP[K4] are also supposed be fully functional and maximally specialized, which also implies the presence of two different sources of stimulation. Such sources can be either the two external stimulation ones (a(1) and a(2)), as in the case of MP[K3] (upper panel), or either one of them and MP[O2]'s output signal (a(3)), as in the case of MP[K4] (lower panel).

Figure 3: Another case of D1 (see Table 1), with more complex observed (MP[O2]) and observing (MP[K3] and MP[K4]) systems.

38. MP[K3] cannot detect MP[O2]'s output signal, while MP[K4] cannot detect one of the input signals that stimulates MP[O2]'s sensors. Once again, the two observing systems miss an aspect of the observed system's behavior, although the former differ in one important respect. MP[K3] cannot achieve any behavioral knowledge whatsoever (i.e., will remain completely ignorant about MP[O2]'s behavior), for it cannot detect a(3), MP[O2]'s output signal (MP[O2] could as well not exist for MP[K3]). In contrast, MP[K4] can detect a(3), as well as one of the stimulation sources received by MP[O2], a(1). To this extent, MP[K4] can achieve some behavioral knowledge about MP[O2]. Such knowledge, however, can only be partial, since the domain to be classified by MP[K4] effectively becomes a set of only four distinct I-O relations (exactly those that would constitute the universe of a one-sensor MP element), out of the 14 possible that constitute MP[O2]'s actual I-O universe. The other 10 mappings are inaccessible to MP[K4], AS I-O MAPPINGS. To be sure, MP[K4] can still detect changes in a(3) that depend on changes in a(2). However, since MP[K4] cannot detect a(2), the former changes would not constitute any I-O occurrence at all, but mere stimulus-free, spontaneous responses, as it were. In order to be able to detect all possible I-O occurrences that constitute MP[O2]'s behavior, both observing systems would need to have at least one more sensor.

39. On the output side, MP[K4] can only respond in two different ways to any input pattern, while the subset of MP[O2]'s behavior that is empirically accessible to MP[K4] consists of four distinct mappings. Hence, MP[K4] can only achieve coarse partitions of (a proper subset of) MP[O2]'s behavior. MP[K4] would need to have at least one more output element, receiving a connection from each sensor, to achieve a maximally fine (although still partial) behavioral knowledge about MP[O2].

40. The outcome of the above analysis can be summarized in the following statement: Some MP systems cannot achieve a complete maximally fine behavioral knowledge about other MP systems of the same complexity (S1). S1 subsumes the outcome that was obtained from the analysis of the first case and MP[K3], which showed that some MP systems cannot achieve any behavioral knowledge whatsoever about other MP systems of the same complexity.


41. The main conclusion of the preceding analysis (as summarized by S1) is that sameness in relative structural complexity can hinder a system's ability to attain behavioral knowledge about another system of the same kind. This conclusion provides only a preliminary answer to my initial question. Clearly, such an answer is far from being comprehensive and final. Much further analysis thus is needed. As a discussion, I want to address only three issues that are raised by this answer. Of course, many more issues can be raised, so I do not pretend the ones I have chosen to provide a complete list, nor to discuss them exhaustively. My sole purpose here is to provoke further thought, so the present section is largely CONJECTURAL and TENTATIVE.

42. A first issue is the nature and role of definitions and concepts in science. This issue arises from the fact that I have defined the key terms in my analysis (viz., 'behavior', 'knowledge', and 'complexity') in ways that some readers may find questionable. I have already anticipated this issue, when I defined those terms, but a few general remarks are in order. Of course, as is well-known by professional philosophers, the issue involves a host of difficult and largely unresolved problems that I must bypass (e.g., Are concepts different from definitions? What are concepts? Are they mental particulars? Are they mind-independent universals? Exactly how are they related to definitions? And what is the proper form of definitions? What does 'defining a term' amount to? Is a definition of a term a mere conventional stipulation, a summary of typical uses of the term by some linguistic community, or a specification of essences? Or are there different kinds of definitions? If so, what kinds are there?).

43. Strictly speaking, I have defined the terms 'behavior-in-MP-systems', 'knowledge-in-MP-systems', and 'relative-structural-complexity-in-MP-systems', NOT the terms 'behavior', 'knowledge', and 'complexity' per se. My use of the latter terms was only an abbreviation device to make the discourse a bit more succinct. Such use thus should NOT be interpreted as establishing a synonymical relation between 'behavior-in-MP-systems' and 'behavior', 'knowledge-in-MP-systems' and 'knowledge', or 'relative-structural-complexity-in-MP-systems' and 'complexity'. I do not pretend the definitions I have adopted here to exhaust all (not even a fair fraction) of all the meanings of 'behavior', 'knowledge, and 'complexity'. My only purpose with those definitions, I insist, was to achieve a reasonable balance between simplifying conceptual delimitation and some degree of conceptual compatibility with other definitions, uses, or meanings (between precision and generality). Alternative definitions may be different because of different definitions of 'behavior', 'knowledge', and/or 'complexity' in reference to MP systems, or because of references to other kinds of systems. In any case, I have excluded other definitions here because they lead to different kinds of scenarios whose proper treatment, in the interest of space, is better left for further analyses.

44. Again, the key point here is that the present analysis will be conceptually relevant to others insofar as they involve fundamentally compatible uses of those terms. Whether any definition is inherently more or less legitimate than another, I see as a sterile issue. Clearly, no definition can hope to adequately capture even a substantial fraction (let alone all) of the available (much less possible) meanings, definitions, or uses of those or any other terms. Also, any attempt to justify any definition as being inherently more legitimate than others will, sooner or later, lead to a dead end. To define is to delimit, and that is exactly what I have done here. To me, any discussion on whether the boundaries I have drawn (or anyone else draws, for that matter) are more or less legitimate than others is philosophically and scientifically inconsequential.

45. A second issue has to do with the emphasis I have made on behavioral knowledge. Such an emphasis was only a convenient simplifying device, of course. I do not pretend to reduce biobehavioral systems to behavior (otherwise, I would have called them 'behavioral' systems). Almost any reasonably delimiting definition of 'behavior' must allow for the exclusion of certain features that would qualify as 'nonbehavioral' (unless it renders everything as 'behavioral', in which case, the term and its definition would be useless for conceptual delimitation purposes). In particular, biobehavioral systems also have an ANATOMY and PHYSIOLOGY, which can also be objects of knowledge.

46. I have identified behavioral traits in MP systems with I-O relations and anatomical traits with structural features. Physiological traits can be identified with the activation rule, and, in the case of networks that have hidden elements, with the activations of these elements. Such activations are often regarded in the connectionist literature as constituting 'internal states'. In addition to achieving behavioral knowledge, then, an observing MP system could face the task of achieving anatomical as well as physiological knowledge about another MP system, although it is unclear exactly how the last two kinds of knowledge could be defined. In any case, these possibilities generate a rich gamut of thought experiments to determine whether or not an observing MP system that can achieve several kinds of knowledge (e.g., behavioral, anatomical, and physiological) would have to be even more complex than a system that can achieve only one kind of knowledge (e.g., behavioral).

47. The third and last issue I want to raise is generality. The main question is: How general is S1? A definitive answer to this question, of course, goes well beyond the present paper. The best I can do here is to offer a series of HYPOTHETICAL GENERALIZATIONS of S1, which I present only as tentative CONJECTURES to encourage further work on the kind of analysis I have done. The ensuing conjectures will qualify as generalizations of S1 to the extent that they involve uses of 'behavior', 'knowledge', and 'complexity' that are fundamentally compatible with the ones I have adopted here, which raises the first issue once again. If orthogonal meanings are assumed, then the conjectures in question will not count strictly as generalizations of (and, hence, would not be supported by) S1. In any case, it should be clear that in no way I regard S1 as providing conclusive evidence for any of such conjectures.

48. An obvious generalization goes from S1 to all possible pairs of equally complex MP systems (i.e., to all cases of D1, D4, and D13 in Table 1). This generalization leads to the following conjecture: NO MP SYSTEM can achieve a COMPLETE MAXIMALLY FINE behavioral knowledge about OTHER MP SYSTEMS OF THE SAME COMPLEXITY (C1). One implication of C1 is that for an observing MP system to achieve a complete maximally fine behavioral knowledge about an observed MP system, the former must be more complex than the latter. But exactly how much more complex? A detailed answer to this question is better left for another occasion. Suffice it to say for the moment that, in the case of a scenario where the observed system is MP[O1] (see Figure 2), an observing MP system would need to have at least one more sensor of the appropriate kind than either MP[K1] or MP[K2], in order to be capable of at least detecting MP[O1]'s output signal. This leads to a combination of the two scenarios I have examined (Figures 2 and 3), where the observed system is MP[O1] (Figure 2), and the observing system is MP[K4] (Figure 3). Additionally, to achieve a maximally fine behavioral knowledge, an observing MP system would have to be even more complex by having at least one more output processing element.

49. The resulting scenario is depicted in Figure 4, where MP[K5] is the observing system. In terms of Table 1, if MP[K5] is A and MP[O1] is B, then F(A)=F(B) in one feature (number of layers) and F(A)>F(B) in three features (number of sensors, number of connections, and number of processing elements). The scenario thus is a case of D10. MP[K5] certainly has the minimal input capacity to have a complete sensory access to all the I-O occurrences that constitute MP[O1]Òs behavior, as well as the output capacity to respond differentially to each one of those mappings. However, having the necessary input and output capacity may not be sufficient. MP[K5] will not attain a complete and maximally coarse behavioral knowledge about MP[O1] if the task is demonstrated to be linearly inseparable, in which case an observing MP system would have to be even more complex than MP[O1] by having at least one hidden processing element. I shall leave such a demonstration for another occasion. For an MP system to achieve a complete maximally fine behavioral knowledge about another MP system, then, the former system would seem to have to be substantially more complex than the latter. This corollary can be summarized in the following conjecture: FOR AN MP SYSTEM TO ATTAIN A COMPLETE MAXIMALLY FINE BEHAVIORAL KNOWLEDGE ABOUT ANOTHER MP SYSTEM, THE FORMER MUST BE MORE COMPLEX THAN THE LATTER (C1a).

Figure 4: A case of D10 (see Table 1), with an observing system (MP[K5]) that is more complex than the observed system (MP[O1]).

50. Another implication of C1 arises from a special case of D1 in Table 1, where the same MP system plays the role of observed and observing. The main question here is the extent to which an MP system can attain behavioral SELF-knowledge, or SELF-MAPPING. It is clear that, in order to attain self-mapping, an MP system must at least be capable of detecting its own output activation, which requires at least one recurrent connection. A detailed analysis of this case is better left for another occasion. However, a brief commentary is necessary, for the case makes contact with important issues in artificial intelligence, in particular the symbolic-subsymbolic distinction.

51. To begin with, it is unclear whether or not an MP system can achieve any behavioral self-knowledge whatsoever, however partial. But even if it can, such self-knowledge would seem to be inevitably coarse. The MP systems I have analyzed (except for MP[K5]) can only achieve coarse behavioral knowledge. As I have argued, these systems need at least one more output element to achieve a finer behavioral knowledge. In contrast, an increase in the output capacity of a system with recurrent connections feeding onto and among its output elements will cause a corresponding increase in the size of its I-O universe. This increase, in turn, not only imposes a further increase in the system's capacity to detect its own output signals (especially in the number necessary recurrent connections), but also in its output capacity to achieve a finer partition of its I-O universe, and so on, ad infinitum. So, attempting to achieve finer self-mappings would seem to trigger an infinite regress of ever-increasing I-O universes and ever-more complex systems. This possibility can be summarized in the following conjecture: NO MP SYSTEM CAN ATTAIN A COMPLETE MAXIMALLY FINE BEHAVIORAL SELF-KNOWLEDGE IN FINITE TIME (C1b).

52. It is tempting to interpret C1b in terms of the notion of 'self-reference' and its usual companions, such as Gödelian incompleteness and undecidability, and puzzles like the Liar's Paradox (e.g., 'This statement is false'). Although the reader is free to attempt interpretations in this direction, I prefer not to succumb to the temptation and ponder the issue carefully, trying not to hasten conclusions based on what may well be superficial similarities. There certainly is a family resemblance between nets with recurrent connections and self-referential sentences. However, such a resemblance may not amount to much more than the maximally general notion of reflexiveness.

53. One concern here is that the technical concept of self-reference arose originally in the context of (and, hence, is more readily applicable to) SYMBOLIC (and, in some cases, FORMALIZED) systems. Therefore, the concept cannot be applied to subsymbolic systems without substantial modification. A preliminary task in this direction, then, would be to obtain a definition of 'self-reference' that is suitable for subsymbolic systems, but share nontrivial aspects with the symbolic definition. In this manner, we arrive at a crucial problem: The validity, generality, and precise form of the distinction between symbolic and subsymbolic systems (e.g., Barnden, 1998; Boden, 1991; Feldman & Ballard, 1982; Smolensky, 1990). Alas, this problem remains largely unresolved. I cannot discuss it in detail here, much less try to resolve it, but let me nonetheless make a few remarks about it.

54. The main issue that is raised by C1b is whether or not PURELY SUBSYMBOLIC systems are capable of 'perfect' (complete and maximally fine) behavioral self-knowledge. To be sure, the category 'purely subsymbolic' presupposes the category 'purely symbolic' and, hence, a fundamental symbolic-subsymbolic distinction. As Boden (1982) has argued, the distinction becomes considerably fuzzy in the case of MP systems, although it does not disappear altogether. The following commentary by McCulloch to von Neumann's talk in the Hixon Symposium (Jeffress, 1951) complements Boden's references in this regard: "... it was not until I saw Turing's paper that I began to get going the right way around, and with Pitts' help formulated the required logical calculus. What we thought we were doing (and I think we succeeded fairly well) was treating the brain as a Turing machine; ... The delightful thing is that the very simplest set of appropriate assumptions is sufficient to show that a nervous system can compute any computable number. It is that kind of device, if you like - a Turing machine." (pp. 32-33). This commentary (and, of course, the fact that all neural networks can be simulated in today's digital computers) certainly questions a sharp symbolic-subsymbolic distinction. This implication, however, is mitigated by McCulloch's initial commentary to the same talk: "I confess that there is nothing I envy Dr. von Neumann more than the fact that the machines with which he has to cope are those for which he has, from the beginning, a blueprint of what the machine is supposed to do and how it is supposed to do it. Unfortunately, for us in the biological sciences - or at least in psychiatry - we are presented with an alien, or enemy's, machine. We do not know exactly what the machine is supposed to do and certainly we have no blueprint of it." (p. 32). Additionally, of course, one could take into account other known differences (viz., distributed numerical versus local symbolic representations, parallel versus sequential processing, graceful versus graceless degradation, and so on) and thus maintain a sharper symbolic-subsymbolic separation. So, there certainly are similarities as well as differences, even between MP systems and Turing machines. However, their relative significance seems to largely be in the eye of the beholder.

55. It is clear that perfect self-mapping is possible without infinite regress and vicious circle IN PURELY SYMBOLIC FORMALIZED SYSTEMS, as demonstrated by Gödel (1931). No supporter or detractor of a fundamental symbolic-subsymbolic distinction would deny that. However, for those who do make a fundamental distinction, that fact does not necessarily generalize to purely subsymbolic systems. Hence, from that particular perspective, the truth of C1b is not necessarily inconsistent with Gödel's findings. Of course, for hardcore connectionists, such truth would be problematic in that it would state yet another limitation of purely subsymbolic systems (assuming that C1b is true for all artificial neural systems). But that should not stop connectionists from determining exactly why purely subsymbolic systems cannot achieve perfect self-mapping without infinite regress and vicious circle (in case C1b is true, of course). Perhaps, attaining this task is possible only through something only purely symbolic formalized systems can do, such as, for example, the distinction between use and mention (which underlies Gödel's proofs). IF making this distinction is necessary for attaining perfect self-mapping without vicious circle or infinite regress, and purely subsymbolic systems cannot make it, then they will be unable to solve the task. On the other hand, the use-mention distinction may be SUFFICIENT but not necessary for attaining perfect self-mapping. If this is the case, then the reason why MP systems cannot attain self-knowledge (again, if C1b is true) may have nothing to do with their inability to make the use-mention distinction (if, indeed, they are unable to make it).

56. There are also those who focus on the similarities and reject a fundamental symbolic-subsymbolic distinction. The two extreme positions here would hold that ALL adaptive systems are either fundamentally symbolic or fundamentally subsymbolic. For either position, the truth (but not the falsity) of C1b would be problematic, insofar as it would state yet another difference between purely symbolic and purely subsymbolic systems (although it too could be regarded as nonfundamental). Under a more eclectic position, hybrid symbolic/subsymbolic systems with recurrent connections may be capable of perfect self-mapping without infinite regress and vicious circle, but I am not aware of any effort in this direction. In any case, I insist that whether or not MP (or any other kind of artificial neural) systems are capable of attaining perfect self-mapping cannot be determined a priori until the symbolic-subsymbolic issue is satisfactorily resolved (or dissolved, for that matter).

57. Continuing with the issue of generality, a hypothetical generalization of C1 to all possible pairs of artificial neural systems of all kinds leads to a second, more encompassing conjecture: No artificial neural system of any kind can achieve A COMPLETE MAXIMALLY FINE behavioral knowledge about OTHER artificial neural systems of the same kind AND COMPLEXITY (C2). C2 poses the challenge of verifying (or falsifying) S1 and C1 in other neural-network theories. The kind of outcome I have obtained here refers to the relationship between a network's structure and its environment, which is a fairly basic issue in neural-network research (the main difference in the present case being that the environment involves the behavior of another network). Hence, I see no a priori reasons why comparable outcomes should not be obtained with other neural-network theories. In particular, whether or not a theory has a learning rule, or even different learning rules across different theories, may not make much of a difference. Whether activations in the theory of interest are discrete (binary or otherwise) or continuous will make an important difference in the particular realization of the kind of scenario I have analyzed here. In particular, the definition of finite I-O universes for networks whose activation states are continuous inevitably requires some criterion for the definition of finite sets of possible activation states (input as well as output). Otherwise, the kind of analysis I have done here would be unfeasible and the implications trivial (i.e., limitless knowledge capacity and impossibility of knowing infinite behavioral spaces completely).

58. Another link in this chain of hypothetical generalizations would go from C2 to all possible pairs of biobehavioral systems of all kinds: No biobehavioral system of any kind can achieve A COMPLETE MAXIMALLY FINE behavioral knowledge about other biobehavioral systems of the same kind and complexity (C3). More general conjectures are possible (e.g., one could talk about 'systems' in general, without qualifying them as 'biobehavioral'), but I shall stop the generalization chain at this point. C3, of course, is extremely speculative, for which determining its truth or falsity will require substantial further analyses. Here, I can only prescribe certain minimal conditions that must be satisfied for a systematic and rigorous determination of whether C3 is true or false.

59. The crucial condition is that the systems of interest must be described by some THEORY that allows for the kind of analysis I have done here. The theory must provide a precise characterization of a kind of system X that is capable of receiving certain inputs and returning certain outputs, with precise rules for relating inputs and outputs, so that a clear and precise definition of 'behavior-in-X-systems' as 'finite I-O universe' is possible. Also, the theory must allow for clear and precise definitions of the roles of observed and observing, as well as the terms 'behavioral-knowledge-in-X-systems' and 'relative-structural-complexity-in-X-systems'. The theory should also permit the definition of a criterion that is sufficiently clear and precise to yield unequivocal decisions on relative structural complexity. These conditions were sufficient for the present analysis. They, however, may well be insufficient for extending it to other kinds of biobehavioral systems. In any case, the specific reasons for the hindrance that sameness in complexity imposes on behavioral knowledge, in those systems where C3 is demonstrated to hold, may well differ from one kind of system to another. In the case of MP systems I have analyzed, the main reason was that such sameness imposed a limitation in the SENSORY ACCESS by the observing system to all the constituents of the observed system's I-O mappings. This limitation, if it obtains for different kinds of biobehavioral systems, will likely take different forms from one kind to another. Moreover, sensory limitations need not be the only (not even a) hindrance for behavioral knowledge among equally complex biobehavioral systems of all kinds. Sameness in structural complexity may impose limitations on other (not necessarily exclusive) respects.

60. I have presented a thought experiment on biobehavioral epistemology of BBS. The analysis involved a great deal of simplification, which has certainly moved us far away from human knowledge, the initial motivation of any naturalistic epistemology programme. What is lost in plausibility, however, is gained in clarity, precision, and depth. To reduce the gap, it is tempting to take C3 as a working hypothesis and speculate on its implications for the possibility of a complete and maximally fine human knowledge about human behavior. However, this is yet another temptation I shall resist, at least for the moment.


I thank Joshua W. Brown, Douglas R. Hofstadter, Peter R. Killeen, and John G. Taylor for useful comments and suggestions to previous drafts of this paper. I am also indebted to François Tonneau for many fruitful discussions and comments.


[1] This kind of scenario is analogous to the one examined by MacKay (1969), in that the latter also involves pairs of systems of a certain kind interacting in certain ways, and the issue of whether they can 'know' anything about each other. However, MacKay's scenario involves applying INFORMATION THEORY to pairs of HUMANS who are engaged in verbal dialogue. I shall not conceptualize MP systems as information-processing devices, let alone as human systems capable of verbal communication. Hence, I will not frame my analysis within information theory. To be sure, information-theoretic connectionist models exist (e.g., Becker & Hinton, 1992; Linsker, 1988), but I will not explore them here (although they can certainly be explored in further analyses). The present proposal thus goes in a direction that is conceptually and methodologically different from MacKay's, although I acknowledge the possibility that, in the end, the two proposals may well be notational variants of each other. In any case, the present analysis can be seen as an extension of MacKay's analysis to connectionist, subsymbolic systems. This brings the issue of whether or not subsymbolic systems are fundamentally different from symbolic systems, which I will address in the last section.

[2] For the moment, I will assume that the two roles are played by two distinct, separate MP systems. In the discussion, I will examine the case where the two roles are played by the same system.

[3] I assume that the four features are equally important in determining a feedforward network's structural complexity. Of course, it is possible to adopt any preferred ordering of features. However, any ordering, however intuitive, would seem to be very problematic to justify BIOBEHAVIORALLY, which opens a rather circuitous avenue of analysis. For example, on what biobehavioral basis do we say that having more hidden layers is 'more important' than having more sensors? To put it more vividly, given the forced choice of loosing a retina or a section of the primary visual cortex, which one would we choose and why? These questions raise difficult justification issues I cannot address here. So, for my present purposes, I prefer to keep this aspect of the analysis as theoretically undemanding as possible. The same applies to any caveats that might be raised by the fact that most of those structural features are linked. For example, it is not possible to increase S, L, or E without increasing C (assuming full feedforward connectivity). For the moment, I prefer to avoid these complications and leave them for further analyses.

[4] I assume that all the systems are immersed in a medium that allows signals to travel and reach each sensor without any distortion (i.e., there is no noise in these systems' environment). The setting depicted in this figure is formally equivalent to one in which two MP elements constitute a single system with two layers, one layer constituted by the observed system and the other by the observing systems. Under this interpretation, one could ask whether or not a certain PART of the resulting system can achieve any knowledge about the behavior of another part. However, this alternative interpretation seems more complicated, for it raises the problem of characterizing the part-whole relation. Hence, I prefer to treat different MP systems as if they were separate, independent individuals that can go their own ways whenever they are not interacting. The scenario depicted in Figure 2 and the rest of the figures thus represent a sort of close encounter of the second (perhaps the third) kind between individually different MP systems.


Barnden, J. A. (1998). Artificial intelligence and neural networks. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (pp. 98-102). Cambridge, MA: The MIT Press.

Becker, S., & Hinton, G. E. (1992). A self-organizing neural network that discovers surfaces in random-dot stereograms. Nature (London), 355, 161-163.

Boden, M. A. (1991). Horses of a different color? In W. Ramsey, S. P. Stich, & D. E. Rumelhart (Eds.), Philosophy and connectionist theory (pp. 3-19). Hillsdale, NJ: Lawrence Erlbaum.

Churchland, P. M. (1989). A neurocomputational perspective: The nature of mind and the structure of science. Cambridge, MA: MIT Press.

Churchland, P. S. (1986). Neurophilosophy: Toward a unified science of the mind/brain. Cambridge, MA: MIT Press.

Dawkins, R. (1982). The extended phenotype. Oxford University Press.

Feldman, J. A., & Ballard, D. H. (1982). Connectionist models and their properties. Cognitive Science, 6, 205-254.

Gdel, K. (1931). On formally undecidable propositions of Principia Matematica and related systems I. Reprinted in S. G. Shanker (Ed.), Gdels Theorem in Focus (pp. 17-47). London: Routledge.

Jeffress, L. A. (Ed.)(1951). Cerebral mechanisms in behavior: The Hixon Symposium. New York: Wiley.

Linsker, R. (1988). Self-organization in a perceptual network. Computer, 21, 105-117.

MacKay, D. M. (1969). Information, mechanism and meaning. Cambridge, MA: MIT Press.

McCulloch, W. S. (1961). What is a number, that a man may know it, and a man, that he may know a number? General Semantics Bulletin, Nos. 26 and 27, 7-18. Reprinted in W. S. McCulloch (1988), Embodiments of mind (pp. 1-18). Cambridge, MA: MIT Press.

McCulloch, W. S. (1964). A historical introduction to the postulational foundations of experimental epistemology. In F. S. C. Northrop and H. H. Livingston (Eds.), Cross-cultural understanding: Epistemology in anthropology (pp. 180-193). Reprinted in W. S. McCulloch (1988), Embodiments of mind (pp. 359-372). Cambridge, MA: MIT Press.

McCulloch, W. S., & Pitts, W. H. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133. Reprinted in W. S. McCulloch (1988), Embodiments of mind (pp. 19-39). Cambridge, MA: MIT Press.

Minsky, M. L., & Papert, S. A. (1988). Perceptrons (Expanded Edition). Cambridge, MA: MIT Press.

Parberry, I. Structural complexity and discrete neural networks. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (pp. 945-948). Cambridge, MA: MIT Press.

Pitts, W., & McCulloch, W. S. (1947). How we know universals: The perception of auditory and visual forms. Bulletin of Mathematical Biophysics, 9, 127-147. Reprinted in W. S. McCulloch (1988), Embodiments of mind (pp. 46-66). Cambridge, MA: MIT Press.

Rosenblatt, F. (1958). The perceptron: A probabilistic theory for information storage and organization in the brain. Psychological Review, 65, 386-408.

Smolensky, P. (1990). Connectionism and the foundations of AI. In D. Partridge & Y. Wilks (Eds.), The foundations of artificial intelligence (pp. 306-326). Cambridge University Press.

Windner, R. O. (1961). Single stage threshold logic. In Proceedings of the Second Annual Symposium and Papers from the First Annual Symposium on Switching Circuit Theory and Logical Design (pp. 321-332). American Institute of Electrical Engineers.

Volume: 13 (next, prev) Issue: 026 (next, prev) Article: 1 (next prev first) Alternate versions: ASCII Summary