Ashley M. Aitken (1993) Have Module, Need Architecture!. Psycoloquy: 4(47) Categorization (10)

Volume: 4 (next, prev) Issue: 47 (next, prev) Article: 10 (next prev first) Alternate versions: ASCII Summary

Topic:

PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).

Psycoloquy 4(47): Have Module, Need Architecture!

HAVE MODULE, NEED ARCHITECTURE!
Book Review of Murre on Categorization

Ashley M. Aitken
Artificial Intelligence Laboratory,
School of Electrical Engineering and Computer Science,
University of New South Wales,
PO Box 1, Kensington,
NSW, 2033, AUSTRALIA.

ashley@cse.unsw.edu.au

Abstract

Murre's (1992a) book presents a new neural network module for unsupervised categorization in the style of ART (Carpenter & Grossberg, 1988). The book is relatively easy to read but progresses a little slowly in places. The local representation of the CALM module yields, amongst other things, a stable learning system. The competition driven arousal system yields the fast learning rate and dissociation between activation and elaboration learning. Also, with a slight modification, the CALM module can capture the topology in the input patterns. For those working on competitive learning systems this book is highly recommended. Although it has a distinct psychological flavor, it should appeal to the taste of most computer scientists interested in neural networks, and those neuroscientists interested in higher level models. CALM is a significant and interesting neural network module, very well thought out and well exercised. However, this book can only be the beginning, for, if CALM is the module, what is the architecture?

Keywords

Neural networks, neurobiology, psychology engineering, CALM, Categorizing And Learning Module, neurocomputers, catastrophic interference, genetic algorithms.

I. INTRODUCTION

1.1 Murre's (1992a) book presents a new neural network module for unsupervised categorization in the style of ART (Carpenter & Grossberg, 1988). It provides a detailed account of the basic module, the characteristics of the learning algorithm, and the psychological plausibility of the model. It also incorporates a more general discussion of the benefits of modularity and genetic algorithms and the computational and biological plausibility of the model. There are a number of appendices - the first gives the parameter values used in simulations, while the others give an account of hardware and software implementations of neural networks in general, and more specifically of modular neural networks.

1.2 This review begins with an overview of the module, with comments on its performance. It then questions the validity of a discussion of modular neural networks without a serious discussion of the architecture of such networks. Next, in accord with the book, the review considers the neuroscientific, psychological and computational aspects of the module. In conclusion, a quick comparison with the ART architecture and some general impressions of the book and theory are presented.

II. OVERVIEW OF THE CALM MODULE

2.1 "The single most important feature of a CALM module is its ability

      to categorize input activation patterns autonomously" (Murre,
      1992a, p. 25).

2.2 The Categorizing And Learning Module (CALM) is a structured neural network using formal neurons with predefined internal connections and weights. Only the intermodular excitatory weights are variable. Input patterns to the module compete for a local representation. A variant of the Grossberg learning rule is used to increase the likelihood that the winning representation node will win the competition on subsequent presentations of the same input pattern.

2.3 To address the stability-plasticity dilemma (Grossberg, 1988) the CALM module incorporates an arousal-attentional mechanism. For familiar input patterns the competition for representation is low and the module is in a nonaroused state. Consequently, the learning rate is set low. This is a form of activation learning as it tends to incrementally reinforce previous representations.

2.4 For novel input patterns the competition for representation is high and the module is in an aroused state. Consequently, the learning rate is set high. This is a form of elaboration learning as it tends to form new representations. Also, when the competition is high, noise is added to the representation nodes to help resolve the competition and to determine the winning local representation.

2.5 With a slight modification of the weights of the fixed internal lateral interconnections, the CALM model becomes a Categorization and Learning Self-organizing Module (CALSOM). CALSOM is shown to have the capacity to arrange its representations (at least) locally in such a way that reflects the topology of the input space in the style of Kohonen (1982).

III. PERFORMANCE OF THE CALM MODULE

3.1 The learning rate of the CALM module appears impressively high, and shows little degradation with increase in module size. However, it is difficult to judge the relative performance without any serious comparisons to other learning paradigms. The module is able to discriminate nonorthogonal patterns and, unlike ART, does not reject new inputs when all representation nodes are otherwise committed. Instead, CALM expands the receptive field of the representation nodes to incorporate the new inputs. Again, however, it is not clear that this is the best approach for all circumstances.

IV. MODULE VS MODULAR

4.1 "Although we are not yet able to specify the relation between

      architecture and learning possibilities fully, we shall
      illustrate this approach here" (Murre, 1992a, p. 43)

4.2 The main focus of the book is on the learning and categorization in the CALM module rather than learning and categorization in modular neural networks (as the title may suggest). Various modular neural networks are presented, but there is no dedicated attempt to provide a theory of the ARCHITECTURE of such modular neural networks. How does one choose an architecture for a particular problem? What do various architectures have to offer? These are some of the important questions about modular neural networks which need to be investigated. At the end of the day, any module is really only as good as our understanding of how to combine it to produce modular systems.

V. NEUROSCIENCE, PSYCHOLOGY, AND COMPUTATION

5.1 The book is an excellent example of real constraint satisfaction. Trying to do significant research in the twilight zone between psychology, neuroscience and computation theory without offending any psychologists, neuroscientists or computer scientists (not to mention the philosophers) is no easy task. As the author points out, in a number of the simulations, the aim is not to do the best machine learning, or to provide the most accurate simulation of the neuroscience but to take on board the knowledge from both of these fields to achieve a STRONG understanding of the behaviour of the module, and of modular systems.

5.2 NEUROSCIENCE

5.2.1 The CALM module is loosely based on the neurobiology of the cerebral neocortex. Short range connections (intramodular) are mostly inhibitory, and long range connections (intermodular) are all excitatory. Also, CALM attempts to follow the principle of locality, in only using local information for any computation. Similarly, in accordance with Dale's law, the formal neurons make either excitatory or inhibitory connections but not both. Interestingly, new evidence (Guldner, 1993) suggests that under certain conditions the nature of synapses may change.

5.2.2 In particular, the CALM module is presented as being loosely modeled on the minicolumns of the cerebral neocortex. In fact, there are many types of "columns" in the cerebral neocortex. There are the ontological columns defined by the radial growth of cortical anatomy (Rakic, 1988); the cortical minicolumns, cortical columns, or perhaps more descriptively the cortical slabs, defined functionally by their receptive fields (for example, in the somatosensory cortex (Mountcastle, 1978) or the visual cortex (Hubel & Wiesel, 1977); the cortico-cortical columns defined by the termination of cortical to cortical connections (Jones, 1975); the cortical hypercolumns defined as a representative set of cortical minicolumns; and there are others.

5.2.3 The representation nodes in the CALM module, it is suggested, play a role similar to the pyramidal neurons in a cortical minicolumn. However, whereas the pyramidal neurons in a minicolumn are thought to fire collectively (perhaps differing in the number of units firing but mainly in the degree to which they all do or do not fire (Eccles, 1984)), the CALM module has each representation node representing independently. It would seem more appropriate for the analogy to be between the CALM module and a cortical hypercolumn (and perhaps this is what Murre had in mind). Here, each representation node (with perhaps some other internal nodes) would represent a cortical minicolumn and the CALM module, would represent the entire cortical hypercolumn. It could also be that a CALM module contains a number of hypercolumns and then it may be more analogous to a cortical area.

5.3 PSYCHOLOGY

5.3.1 The behaviour and power of the CALM module is demonstrated by a number of simulations which are presented (in detail) in the book. One of these demonstrates how the context effect for letter recognition may be learnt in an unsupervised and stable manner - answering some criticisms by Grossberg of an earlier version of this model.

5.3.2 The CALM module provides an interesting psychological model of a two-process single-system memory which is capable of dissociating implicit and explicit memory tasks. This model is to be compared with dual-system memory models which have historically been used to explain the dissociation. The distinction in the CALM module results directly from the action of the arousal system to change the learning rate and it appears most likely to have been explicitly built into the module. As such, this may not be such a surprising behavior of the module.

5.4 COMPUTATION

5.4.1 There is often a fine line between stating the obvious and making a new and interesting observation. The sections on the usefulness of modularity in general and the importance of structure in neural networks for better generalization seem to fall prey to this point. It seems rather obvious that more structured neural networks produce better generalization because they are more constrained models. Similarly, the benefits of modularity in implementation and for scaling are almost self-evident. However, these points do need to be made, and they are made clearly by Murre in this book.

5.4.2 It would also seem of little use to criticise "traditional" neural network models for not being modular, or even for not having a biologically plausible learning procedure. A great deal of the initial research in neural networks has been focussed on gaining an understanding of the nature and the essential qualities of parallel and distributed processing. Clearly, few researchers really believe that the brain is one fully connected three-layer network. Recently, work has begun on modular three-layer neural networks (Jacobs et al., 1991). In the same way, although it is difficult to argue for the biological plausibility of backpropagation style learning, it does provide a convenient way to train networks to perform in a parallel and distributed way so that their behavior can be investigated.

5.4.3 "retroactive interference in backpropagation networks can

        be explained entirely in terms of overlap of representations
        in the hidden layer" (Murre, 1992a, p. 152).

5.4.4 A great deal of the simplicity and power of the CALM model arises from the fact that the module's competition produces local representations which are trivially orthogonal. The problem of retroactive interference which plagues systems using distributed representations does not affect CALM because of these orthogonal representations. Hence, CALM will perform perfectly without any interference, no matter what sequence of presentation of training data is used. However, its biggest selling point may also be its biggest failing: As is true of all local representations, the number of possible representations is linear in the number of representations nodes. In the book Murre makes valid arguments against the primacy of the requirement for maximal representation ability. However, although it is discussed, an effective way for combining modules to form semidistributed representations and thus counterbalancing the problem still needs to be demonstrated.

5.4.5 What seems to be one of the major contributions of the CALM theory, however, is the competition-based arousal mechanism, and as a part of that, the learning rate dependent on the competition. Although it is never really demonstrated that this is appropriate in all situations, it does seem eminently reasonable. Perhaps further theoretical work could be done to find some justification for these ideas.

VI. COMPARISON WITH GROSSBERG'S ART MODEL

6.1 The CALM module has a lot in common, but then also a lot to contrast, with the ART module (Carpenter & Grossberg, 1982). First, they are both structured neural networks with predefined internal weights (although it should be noted that ART also has variable internal weights). Second, they both use local representations of input patterns. However, where ART uses mismatch between the expected input pattern for a chosen representation to drive the search for the correct representation, CALM uses the amount of competition for representation.

6.2 CALM also has a variable learning rate, whereas ART has a constant learning rate (and, as mentioned above, a slightly different learning rule). There seems to be no reason, however, why ART could not also incorporate a variable learning rate dependent on mismatch and perhaps speed up learning for novel input patterns. Finally, it would seem that CALM would use less memory and be computationally less expensive than ART because of the reduced number of variable weights and what appears to be a simpler control mechanism.

VII. SUMMARY

7.1 In summary, the local representation of the CALM module yields, amongst other things, a stable learning system. The competition driven arousal system yields the fast learning rate and dissociation between activation and elaboration learning. Also, with a slight modification, the CALM module can capture the topology in the input patterns.

7.2 The book is relatively easy to read but progresses a little slowly in places, particularly during some of the detailed discussion of the psychological simulations and their results. Also, although they are well written, the detailed introduction to genetic algorithms and the lengthy appendices seem a bit out of place.

7.3 For those working on competitive learning systems this book is highly recommended. Although it has a distinct psychological flavor, it should appeal to the taste of most computer scientists interested in neural networks, and those neuroscientists interested in higher level models. CALM is a significant and interesting neural network module, very well thought out and well exercised. However, this book can only be the beginning, for, if CALM is the module, what is the architecture?

REFERENCES

Carpenter, G. A., & Grossberg, S. (March 1988). The ART of Adaptive Pattern Recognition by a Self-Organizing Neural Network. IEEE Computer.

Eccles, J. C. (1984). The Cerebral Neocortex : A Theory of Its Operation. In E. J. Jones & A. Peters (Eds.), Cerebral Cortex Volume 2 (pp. 1-36). Plenum Press.

Grossberg, S. (1982). Studies of Mind and Brain : New Principles of Learning, Perception, Development, Cognition and Motor Control. D. Reidel Publishing Company.

Guldner, F. H. (1993). Activity Dependent Plasticity of Synapses in the Central Nervous System. In P. Leong & M. Jabri (Eds.), Proceedings of the Fourth Australian Conference on Neural Networks (ACNN'93) (pp. 34-35). Sydney: Sydney University Electrical Engineering.

Hubel, D. H., & Wiesel, T. N. (1977). Functional Architecture of Macaque Monkey Visual Cortex. Proc. R. Soc. Lond. B., 198, 1-59.

Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive Mixtures of Local Experts. Neural Computation, 3, 79-87.

Jones, E. G., Burton, H., & Porter, R. (1975). Commissural and Cortico-cortical "Columns" in the Somatic Sensory Cortex of Primates. Science, 190(7 November), 572-574.

Kohonen, T. (1982). Self-organized Formation of Topologically Correct Feature Maps. Biological Cybernetics, 43, 59-69.

Mountcastle, V. (1978). An organizing principle for cerebral function: The unit module and the distributed system. In Edelman, G. M., & Mountcastle, V. B. (1979). The Mindful Brain (pp. 17-49). Cambridge, Massachusetts: MIT Press.

Murre, J. M. J. (1992a). Learning and categorization in modular neural networks. Hillsdale, NJ: Lawrence Erlbaum.

Murre, J.M.J. (1992b). Precis of: Learning and categorisation in modular neural networks. PSYCOLOQUY 3(68) categorization.1

Rakic, P. (1988). Specification of Cerebral Cortical Areas. Science, 241, 170-176.

Volume: 4 (next, prev) Issue: 47 (next, prev) Article: 10 (next prev first) Alternate versions: ASCII Summary