In response to the first four PSYCOLOQUY reviews of my book (Murre, 1992a, 1992b) by Hardcastle (1993), Gregson (1993), Krakauer and Houston (1993), and Sloman (1993), I first address some of the issues raised in a general way, before elaborating on four specific subjects, namely, modularity, categorization, memory, and learning and evolution.
1.1 In response to the first four PSYCOLOQUY reviews of my book (Murre, 1992a, 1992b) by Hardcastle (1993), Gregson (1993), Krakauer and Houston (1993), and Sloman (1993), let me first address some of the issues raised in a general way in Section 2 of this Reply, before elaborating on four specific subjects in Sections 3 to 6, namely, modularity, categorization, memory, and learning and evolution.
2.1 The book outlines some general principles from psychology, biology and engineering. We find such principles important for the development of an approach to connectionist modelling and implementation that offers a framework in which many different models can be expressed. The advantage of such an encompassing framework is that it may be possible to integrate - ultimately - a range of models, each aimed at a different problem in cognition, into a coherent set of theoretical concepts. Among the general principles are (architectural) modularity, locality, self-induced noise, and self-induced learning. We emphasize the link between connectionist elements and the neurobiology underlying the psychological phenomena. The framework laid out in the book is based on a "generic" building block for neural networks: the CALM module (Categorizing And Learning Module) and its variants. These are derived as particular instances of the more general principles. As pointed out in the last chapter of the book (e.g., p. 119), based on the general principles many other modules could be constructed. It is not the particular module that is of principal importance, but the behavior of the range of models based on the chosen module. The goal is, thus, to arrive at a set of general guidelines for modelling and theory construction in experimental psychology and cognitive neuroscience. These guidelines would offer a framework incorporating well-tested principles that could serve as a basis for model construction. The possibility of formulating conflicting models is not excluded by such a framework. On the contrary, models based on a common framework may still differ in many details and could fit conflicting data. As more and more models are being developed on the basis of a common framework, the framework itself can be upgraded with time to incorporate newly discovered guidelines. Such an approach could, perhaps, make the effort of theorizing in psychology a more cumulative undertaking. We believe that, even though such an aim is ambitious, it may be a necessary path to follow if we ever want to escape from the frequent miscommunications caused by too much "special-purpose" theorizing. We are, of course, fully aware that the material contained in the book presents but a first and modest step towards this ideal.
2.2 A question often raised in connectionist modelling is: Why worry about biology, if we are primarily concerned with modelling behavioral data (cf. remark by Hardcastle, 1993, par. 2.2)? The entire connectionist literature is driven by the promise that brain-style modelling of behavior presents us with possibilities to understand much of the structure of the brain in terms of behavior and to understand aspects of behavior in terms of the brain. The common ground formed by the coming of age of modelling formalisms intermediate between neurobiology and psychology, such as connectionism, has made the prospect of achieving some integration of these domains much more realistic. This is not the same as saying that we should try to explain all of behavior in terms of the brain, or vice versa. But it seems worthwhile to focus on the many points of contact between the two domains. Gaps in the theories of one field may be explained by taking recourse to findings in other fields. Though many such interdisciplinairy theories are speculative, they are often testable. At the very least, they frequently suggest promising alleys for new research. We expect no magic from incorporating principles from neurobiology in behavioral modelling, but it is our belief that it is better to take a few steps in the right direction than to take no action at all.
3.1 The CALM module is not a biological model, but several principles derived from biology suggest that a modular network-architecture is eminently suited for psychological modelling. The number of co-activated neurons in a given, small region of the cortex seems to be limited by local, short-range inhibition found throughout the brain. Though it is by no means generally accepted that the cortex consists of distinct modules, the concept of modularity serves a useful purpose as an approximation to this neuroanatomy. The requirement of modules with a deterministic internal structure could be relaxed in more biologically oriented models, leading, for example, to a form of random network such as has recently been described by Braitenberg and Schuz (1991). These networks share with CALM-based models the distribution of connections: local inhibition and long-range, modifiable connections. We are certainly aware of the level of arbitrariness of the specific wiring of a CALM module (e.g., Hardcastle's comment, 1993, par. 2.1) and the book addresses this point in some detail (Murre, 1992a, p. 121- 125). In response to an earlier criticism by biologists we, for example, describe variants of the CALM module with more excitatory nodes than inhibitory nodes.
3.2 Much recent work in connectionist modelling has uncovered the limitations of fully distributed, fully interconnected architectures as models for human learning and categorization (e.g., Ratcliff, 1990; McCloskey and Cohen, 1989; French, 1992; Kruschke, 1992a) and many other researchers now propose that some constraints be imposed on the connectivity. Kruschke's ALCOVE (1992b), for example, uses radial-basis functions, and French (1992) introduces semidistributed representations. The need for a general, constraining framework is also expressed by McClelland (1993). In fact, our principles are very similar to the five principles listed in his recent reports on GRAIN networks (e.g., McClelland, 1993). In particular, his general principle of constructing modular networks with within-module inhibition and between-module excitation coincides fully with our approach to modelling. Both approaches have in common that they view modularity as way of constraining processing without giving up the advantages of interactive processing. Representations are distributed over modules, rather than over individual nodes. Because modules have certain higher-order characteristics such as categorization, processing in these networks gives rise to phenomena not found in fully distributed architectures. One of these is the reduction or absence of catastrophic interference, which is discussed at some length in the book (p. 135-151, also see par. 4.3 below).
3.3 The type of modularity introduced in the book is intended primarily as an architectural principle for psychological models and should not be confused with Fodor's (1983) concept of Modularity of Mind (cf. the title of Sloman's review, 1993). Indeed, in very large models, modularity would have to be supplemented with other architectural principles, such as the streams and hierarchies of functional areas found in the brain. Theoretical investigation of the learning behavior of broadly structured neural networks is still in its infancy and until more results become available we may attempt to uncover the principles of neural-architecture design from the structure of the brain (e.g., Young, 1992). Modules in Fodor's sense might be implemented in such a system in a variety of ways; there is no necessity to equate his module concept with an architectural module.
3.4 In addition to the above two main arguments from biology and psychology, we discuss a range of additional advantages for modular and otherwise constrained architectures, including (iii) decorrelation of inputs (p. 125-126), (iv) extendibility of models (p. 127), (v) improved quality of solution due to a reduction of interference during learning (p. 128), and (vi) improved set generalization (p. 129-132). In Appendices B1-B5, we discuss in detail why constrained architectures are crucial for the feasibility of large hardware implementations of neural networks. The appendices include a mathematical analysis of (vii) modular and nonmodular transputer implementations (p. 166-179), a (viii) detailed description of a realized modular neurocomputer with 400 processors as well as its projected extensions (p. 180-190), and (ix) a discussion of how modularity might be applied to the design of analog neurocomputers.
3.5 In light of the above, Sloman's (1993) assertion that the "high point of the book" (par. 2.0) deals with the extendibility of modular neural networks is obviously inaccurate. Extendibility of architectures is but one (minor) argument among many for constrained architectures. The term refers to the fact that in a modular network, part of the system may remain plastic while other parts remain fixed. This technique has been used successfully by Waibel (1990), for example, and is still gaining popularity, in particular among engineers. Real-world applications tend to be large and complex. Through the addition of new modules to an existing, functioning neural network additional tasks may be learned with little disturbance to the established behavior of the system. In this way, a complex task may be learned (and corrected) bit by bit, rather than all at once. Extendibility is not easily achieved without some form of modularity (see, for instance, Ratcliff, 1990). But, although it is clearly important in neural network engineering, it is by no means our strongest argument for modularity.
3.6 Sloman (1993, par. 2.0) also believes that cognitive processing is "wholly interactive" and that therefore the introduction of modularity is (presumably) pointless. McClelland and Rumelhart's (1981) interactive-activation model of context effects in letter perception is a particularly good illustration of a modular network in which processing is nevertheless interactive. This network is, in fact, used as an example in the book to clarify our approach to modularity (p. 7-8). There, we argue that only a modular architecture can learn to exhibit the context-effects of that model. A simulation showing how a CALM-based model fares on this task is described later in the book (p. 59-62). Sloman (1993), furthermore, has the "frustrating impression" that in cognition "everything depends on everything else" (par. 2.0). Though this may well be true, studies on task interference, context-effects in recognition, perceptual restorations, etc., show that in cognition some things seem to depend more on other things. In other words, cognition seems to have an underlying structure favoring the execution of certain task-combinations over others (e.g., Allport, 1989). We feel challenged to uncover some of the architecture underlying cognitive processes. Our approach is rooted in the fact that a network's particular architecture facilitates learning in certain situations while hindering it in others. As explained above (par. 2.1), the aim - in our view - is to arrive at a set of general guidelines for network construction that captures the relations between network models and simulated cognition. We, therefore, strongly disagree with his Sloman's claim that "connectionism is exciting precisely because it gives us a way of describing a system without having to specify functional parts" (par. 2.0).
4.1 Though a CALM module's functioning is based on categorization, CALM is not a psychological model of categorization. I am currently testing such a model, which is constructed using many CALM modules. One property of this model is that for certain data sets a metric space emerges in which relevant dimensions may shrink or stretch as a result of learning specific category structures. Weighted metric spaces (i.e., those with shrunken or stretched dimensions) have been shown, notably by Nosofsky (e.g., 1985, 1992), to describe categorization and similarity data accurately. As an extension to Nosofsky's model, the CALM-based model also predicts that for integral dimensions subjects rotate dimensional axes prior to shrinking or stretching. A recent reanalysis of Nosofsky's data gives a significantly better fit with such an extended model (Murre, 1993).
4.2 As remarked by Gregson (1993, par. 4), the metric spaces of Shepard (1957, 1987) and Nosofsky (1992) are not in any way related to structures or processes in the brain. Recent connectionist models of categorization (e.g., Gluck and Bower, 1988; Kruschke, 1992b; Shanks and Gluck, 1991) show in more and more detail how abstract metric spaces may emerge in connectionist models. This is an important step. A logical next step seems to be that we try to formulate connectionist models in which we can identify aspects of the model architecture with known biological facts.
4.3 One major conclusion of the book is that catastrophic interference is caused by overlap of the hidden-layer representations (p. 148). This means that backpropagation networks cannot be used to model real-time learning, such as found in most freely functioning organisms. Neural networks that are able to develop more orthogonal hidden-layer representations are less prone to interference. It is argued in the book (p. 152) that one important aspect of unsupervised learning (as opposed to supervised and auto-associative learning) lies in the ability to develop distinct internal representations, independently of the manner of stimulus presentation. They are, thus, well-suited for modelling real-time learning behavior, which is one of the reasons why many authors, in particular Grossberg (1976,) have focused on unsupervised learning for some time. The fact that most - if not all - neural networks for unsupervised learning are based on some form of categorization (e.g., Grossberg, 1976, 1987; Kohonen, 1989, 1990; Rumelhart and Zipser, 1985) makes this mechanism very attractive for any modelling work in the psychology of learning and memory.
4.4 It would be surprising if categorization were the only mechanism underlying memory, but it is certainly a powerful way of discovering regularities in the stream of input patterns and it seems likely that it accordingly plays a fundamental role in the organization of memory. Furthermore, combining it with a modular architecture remedies some of the disadvantages associated with categorization. When a suitable model architecture is chosen, the categorization process need not "throw away information" (Sloman, 1993, par. 3.0). Also, in response to another assertion by Sloman (par. 3.0), networks based on categorization are by no means always outperformed by networks based on distributed representations with regards to generalization of learned behavior (Happel and Murre, 1993). This is also shown by the work on Kohonen maps (1989, 1990), which have been trained to solve a wide range of practical problems and which have shown excellent generalization of performance.
5.1 One of the applications described in the book is the modelling of implicit and explicit memory (p. 62-86). Although connectionist networks lend themselves very well to modelling memory experiments, so far only a few connectionist models of memory have actually been published. The model constructed by us aims to capture some characteristic phenomena reported in the literature. The simulations described in the book focus on word-frequency effects and on the preservation of implicit memory in amnesics (see Phaf, 1991, for more extensive simulations). We admit that it is indeed a small-sized model with simplified word representations (p. 85). Its main purpose is to investigate in what manner the elaboration-activation hypothesis of Mandler (Graf and Mandler, 1984; Mandler, 1980) can be combined with connectionist modelling techniques. We therefore did not want to complicate the model unduly by incorporating too many details secondary to our main goal (see critique by Hardcastle, 1993, par. 3.1). We have, however, included several elements often lacking in connectionist simulations of psychological experiments. One example is a detailed implementation of the simulation procedure itself. We first created "artificial subjects" by training models with background knowledge, which were then presented with an experimental list, followed by either of two simulated tests. Another example is the retrieval procedure which is also fully implemented in the neural network, rather than being approximated by external, algorithmic procedures. (N.B. Contrary to what Hardcastle, 1993, par. 3.1, states, free recall is simulated by presenting the network with a context only - not by presenting it with both a context and a word beginning, see p. 72-73.)
5.2 It should perhaps be stressed that we do not claim anywhere that the elaboration-activation hypothesis is by itself sufficient to explain all the phenomena reported on the vast research subject of implicit memory (cf., Sloman's, 1993, remarks in par. 3.0). We merely aim to give an existence proof: This particular implementation of the elaboration-activation hypothesis is able to account for a well-defined subset of phenomena. We feel that such positive proofs are very important. As long as no fully specified model of a theory has been formulated and evaluated - either mathematically or through simulation - the danger remains that some of the assumptions are wrong. Building a model not only compels the theorist to excavate hidden assumptions lurking beneath the surface, it also forces a precise specification of any terms or concepts used. Connectionist modelling therefore invites and supports further articulation of theories. It also offers a clearer battle ground for competing theories by circumventing word play and by illuminating what is substantively different and what is just a difference in degree in the assertions of two theories. And finally, as argued above, connectionist modelling can serve the purpose of a lingua franca between psychology and neuroscience.
5.3 Our simulations (also see Phaf, 1991) demonstrate that some experiments in implicit-explicit memory can be modelled by a system based on the elaboration-activation hypothesis. They show a dissociation of word frequency. Lesioning the "arousal system" of the model results in a loss of explicit memory but a preservation of implicit memory. Other dimensions of memory are clearly important. Phaf and Wolters (1993), for example, have recently extended the approach by combining the elaboration-activation dimension with a form of encoding specificity or transfer appropriate processing (e.g., Neill, Beck, Bottalico, and Molloy, 1990). This extended model is also expressed within a CALM framework. A combined model could perhaps simulate the observed differences in persistence of implicit memory over time with different tasks (cf. remark by Sloman, 1993, par. 4.1).
5.4 Hardcastle (1993, par. 3.2) remarks that a "damaging" oversight in our simulations is the failure to report the fact that subjects generally show significant priming for high-frequency words. Hardcastle cites a t-test, t(11) = -2.6, which fully coincides with our analyses. For some reason, however, she concludes from this figure that our "neural net `subjects' did not show ANY significant priming effect for high frequency words" (par. 3.2). This is not in accordance with common practice in statistics. The cited two-tailed t-test gives p < 0.025 (or p = 0.012, one-tailed), which by common standards is called significant. We accordingly conclude that our artificial subjects do in fact show significant priming for high-frequency words.
6.1 Krakauer and Houston (1993) focus their review on the application of genetic algorithms to the design of modular learning networks (Murre, 1992a, p. 100-115). They defend the position that genetic algorithms can be "strong" models of evolution, a standpoint with which I wholeheartedly agree, although it is, admittedly, not worked out in any detail in the book.
6.2 In analogy to the "strong" and "weak" generative power of grammars for natural language, I have proposed to apply this distinction to modelling in general (p. 121). A "weak" grammar may generate correct sentences, but their derivation bears no relation to grammatical structures. Linguists aim to develop "strong" grammars, where the nonterminal elements and the derivation sequences themselves can be interpreted in a meaningful way. In general, we might thus define a "weak" model as able to fit the data without its internal structure reflecting any aspect of the modelling domain. A "strong" model, however, has some meaningful internal structure. Straightforward application of backpropagation to some learning task would most likely result in a "weak" model. But even minor constraints on the internal wiring of a backpropagation model may produce a model that has a much more telling internal structure (see, for instance, Rueckl, Cave, and Kosslyn, 1989).
6.3 An example of how a genetic algorithm may be a "strong" model for certain aspects of biological evolution is that "high variability in a given trait might reflect a weak selective pressure" (Krakauer and Houston, 1993, par. 3). Such "strong" correspondences can only be revealed in a changing environment. Because the simulation reported in the book uses an unchanging environment, they interpret our approach to learning "as a tactic to arrive at a solution which ultimately obviates the need to learn" (par. 4). In the simulations reported, we have a genetic algorithm generate modular architectures, which are then trained to recognize a set of handwritten digits. The main purpose of these simulations is to explore the extent to which genetic algorithms can be used as an automatic design method for practical applications. For this limited task their critique certainly applies.
6.4 We have since then continued this line of research and extended it with more complex environments that incorporate different types of change (Happel and Murre, 1993). One example of our experiments is to first train the generated architectures on a limited training set, then test it on a different set, deriving the fitness value from the observed generalization capacity of the architecture. We feel that this method is more faithful to the interaction of evolution and learning, and it may partially meet the criticism of Krakauer and Houston (1993). By systematically pairing the training set with test sets of different variability we may gain some insight into how evolution sets up learning organisms.
6.5 One particular possibility we have tried to mimic in the latter simulations is that evolution first constrains the general learning domain by excluding certain connections through the architecture of the brain. This forms part of the general process of phylogenesis. The connections are then further constrained by a comparatively brief exposure to the specific subdomain in which a young organism happens to find itself. Such a process could perhaps be compared to choosing one of several available developmental programs, although it seems unlikely that these programs are completely distinct. Further fine-tuning of the connections takes place throughout life on the basis of particular learning events encountered.
6.6 In this way, the brain may come to reflect the hierarchy of levels of variability in the environment. Simple everpresent characteristics of a domain, if they are important for survival, will be laid down in the largely unchangeable gross anatomy of the perceptual apparatus and the brain (see for example, Barlow, 1981, for a discussion of the anatomy of the eye and visual cortex in this respect). Elements which vary across individuals but which remain relatively stable throughout an individual's life may be reflected in the presence of several largely distinct developmental programs. Elements which show structure relevant for survival, but which vary within an individuals life, must be "regularized" by a plastic brain. The fact that for our species many aspects of the environment are both extremely variable in their expression and important for survival should be reflected by a comparatively large volume of plastic, "general-purpose" neural tissue. A likely candidate for this is, of course, the neocortex. Its overall structure may reflect the fact that certain sources of stimuli show more common, relevant structure and hence need to interact more strongly. Cortical fine-structure, perhaps in the form of some modular structure or a "skeleton cortex" (Braitenberg and Schuz, 1991), may reflect the lower part of the hierarchy of environmental variability. Connectionist modelling may help us to unravel the architecture of both cognition and the brain. In light of their common evolutionary path, it seems extremely unlikely that these architectures have nothing in common.
Allport, A. (1989) Visual attention. In: M.I. Posner (Ed.), Foundations of cognitive science. Cambridge, MA: MIT Press, 631-682.
Barlow, H.B. (1981) Critical limiting factors in the design of the eye and visual cortex. (The Ferrier Lecture, 1980.) Proceedings of the Royal Society, London, B 212, 1-34.
Braitenberg, V., & A. Schuz (1991) Anatomy of the cortex: statistics and geometry. Berlin: Springer-Verlag.
Fodor, J.A. (1983) Modularity of mind. Cambridge: MIT Press.
French, R.M. (1992) Semi-distributed representations and catastrophic forgetting in connectionist networks. Connectionist Science, 4, 365-377.
Gluck, M.A., & G.H. Bower (1988) Evaluating an adaptive network model of human learning. Journal of Memory and Language, 27, 166-195
Graf, P., & G. Mandler (1984) Activation makes words more accessible, but not necessarily more retrievable. Journal of Verbal Learning and Verbal Behavior, 23, 553-568.
Gregson, A.M. (1993) Networks that respect psychophysiology. PSYCOLOQUY 4(27) categorization.3.
Grossberg, S. (1976) Adaptive pattern classification and universal recoding, II: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 23, 187-202.
Grossberg, S. (1987) The adaptive brain. Volume I: Cognition, learning, reinforcement, and rhythm. Volume II: Vision, speech, language, and motor control. Amsterdam: North-Holland.
Happel, B.L.M., & J.M.J. Murre (1993) The design and evolution of modular neural network architectures. Leiden University, Unit of Experimental and Theoretical Psychology, Technical Report.
Hardcastle, V.G. (1993) What counts as plausible? PSYCOLOQUY 4(26) categorization.2.
Kohonen, T. (1989) Self-organization and associative memory, 3rd edition, Berlin: Springer-Verlag.
Kohonen, T. (1990) The self-organizing map. Proceedings of the IEEE, 78, 1464-1480.
Krakauer, D.C., & A.I. Houston (1993) Evolution, learning & categorization. PSYCOLOQUY 4(28) categorization.4.
Kruschke, J.K. (1992a) Human category learning: implications for backpropagation. Connection Science, in press.
Kruschke, J.K. (1992b) ALCOVE: an exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44.
McClelland, J.L. (1993) Toward a theory of information processing in graded, random, interactive networks. In: D.E. Meyer & S. Kornblum (Eds.), Attention and Performance XIV. Cambridge, MA: MIT Press, 655-688.
McClelland, J.L., & D.E. Rumelhart (1981) An interactive activation model of context effects in letter perception. Part I: an account of basic findings. Psychological Review, 5, 375-407.
McCloskey, M., & N.J. Cohen (1989) Catastrophic interference in connectionist networks: The sequential learning problem. In: G.H. Bower (Ed.) The psychology of learning and motivation. New York: Academic Press.
Mandler, G. (1980) Recognizing: the judgement of previous occurrence. Psychological Review, 87, 252-271.
Murre, J.M.J. (1992a) Learning and categorization in modular neural networks. UK: Harvester/Wheatsheaf; US: Erlbaum.
Murre, J.M.J. (1992b) Precis of: learning and categorization in modular neural networks. PSYCOLOQUY 3(68) categorization.1.
Murre, J.M.J. (1993) The extended Generalized Context Model. Cambridge, MRC APU, Internal report, in preparation.
Neill, W.T., J.L. Beck, K.S. Bottalico, & R.D. Molloy (1990) Effects of intentional versus incidental learning on explicit and implicit tests of memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 457-463.
Nosofsky, R.M. (1985) Overall similarity and the identification of separable-dimension stimuli: a choice model analysis. Perception & Psychophysics, 38, 415-432.
Nosofsky, R.M. (1992) Similarity scaling and cognitive process models. Annual Review of Psychology, 43, 25-54.
Phaf, R.H. (1991) Learning in natural and connectionist systems: experiments and a model. Unpublished doctoral dissertation, Leiden University, Leiden.
Phaf, R.H. & G. Wolters (1993) Attentional shifts in maintenance rehearsal. American Journal of Psychology, in press.
Ratcliff, R. (1990) Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological Review, 97, 285- 308.
Rueckl, J.G., K.R. Cave, & S.M. Kosslyn (1989) Why are 'What' and 'Where' processed by separate cortical visual systems? A computational investigation. Journal of Cognitive Neuroscience, 1, 171-186.
Rumelhart, D.E., & D. Zipser (1985) Feature discovery by competitive learning. Cognitive Science, 9, 75-112.
Shanks, D.R., & M.A. Gluck (1991) Tests of an adaptive network model for the identification, categorization, and recognition of continuous-dimension stimuli. University of California, San Diego, Department of Cognitive Science, Technical Report 9103.
Shepard, R.N. (1957) Stimulus and response generalization: a stochastic model, relating generalization to distance in psychological space. Psychometrika, 22, 325-345.
Shepard, R.N. (1987) Toward a universal law of generalization for psychological science. Science, 237, 1317-1323.
Sloman, S. (1993) Modularity of mind: a question unasked. PSYCOLOQUY 4(29) categorization.5.
Waibel, A. (1989) Modular construction of time-delay neural networks for speech recognition. Neural Computation, 1, 39-46.
Young, M.P. (1992) Objective analysis of the topological organization of the primate cortical visual system. Nature, 358, 152-155.