This article attempts to answer some of the concerns raised by Da Costa (1994) in his review of NEURAL NETWORKS FOR PATTERN RECOGNITION (Nigrin, 1993). Topics discussed include: (1) the overall aims of the book; (2) the locality requirement for neural networks; (3) the representation of synonymous patterns; and (4) translation, size and rotational invariant recognition.
2. The rules defining what constitutes an acceptable neural network solution are given early. "A neural network is a circuit composed of a very large number of simple processing elements that are neurally based. Each element operates only on local information. Furthermore each element operates asynchronously; thus there is no overall system clock" (Nigrin 1993). The book makes no claim that this definition covers all possible formulations of interest. It also stresses that it may have to change if it proves inadequate. However, the definition is presented to insure that the reader has a clear idea of the rules that govern the construction of the neural networks in the book.
3. One deliberate area of ambiguity in the definition involves the question of locality and the coarseness of processing. Da Costa states: "It is not clear, however, to what degree a processing element should be `neurally based': to the level of ionic channels or just to a level that expresses the overall properties of neurons?" Thus, should an entire cell, including weights and inputs, be considered one area of locality? Or should locality be contracted, so that a region of locality only applies to small patches on dendrites? The definition is deliberately vague on this point so as to increase its generality. Those interested in neurobiology will of course be interested in models that are truer to existing physiology and therefore use finer grained elements. However, since finer-grained models require larger simulation times and more complex interactions, those who are more interested in psychology will be better served by incorporating elements of larger granularity. Nevertheless, regardless of the granularity of the model, the locality condition is paramount. It prevents the (sometimes unintended) use of a homunculus and forces an investigation to concentrate on the question of how local elements acting completely independently can achieve global dynamics. In addition, restricting the elements to local information may ease the transition of the networks (or their successors) from software to hardware implementations.
4. The definition does admit the possibility of many important neural variations, such as "extensive modularity, the important role of topographical maps, the variety of neuron and neurotransmitter types, the effect of neuron morphology on function, and the importance of the interconnection topology" (Da Costa, 1994). All of these variations can be viewed as specific circuit realizations within the framework defined above. In fact, they are all important to the models described in the book. All the networks are modular and consist of architectures that use different cell types and different interconnection topologies. (The networks may or may not be topographically mapped, depending on their area of application.) Cells often receive different types of input, including types that excite cells, inhibit cells, or modulate the effects of other inputs or learning rates on cells. Even the effects of neuronal morphology may play a key functional role in the circuits, by reducing the number of connections needed for a hardware realization (Nigrin, 1994c).
5. The definition, however, does preclude the possibility of including hybrid systems within the book. Although hybrid systems are technologically useful, they do not fall within the above framework, and thus their use would compromise one of the primary goals, which is determining what can be achieved using only neuron-like elements. In addition, their use would reduce the relevance of the models to biology and would also obscure the issues involved in creating a model that did not resort to a homunculus. That does not mean that hybrid systems are not useful. In fact, if one attempted to use a neural network today as part of a practical system, it would of necessity be a hybrid system, since neural networks have not yet evolved to the point where they can be used as stand alone systems (even if there were adequate hardware). This is one thing that the book attempts partially to remedy.
6. Given the rules for constructing neural networks, the book then begins a series of gedanken experiments to design a pattern recognition machine. One of the principal areas of investigation involves the construction of a framework by which synonymous and homonymic patterns can be represented. The definition for these types of patterns is quite broad: "Two representations are considered synonymous whenever it is the case that any classifying cell (or cells) treat them as equivalent. The synonymous representations may consist of the same or different patterns of activity (shape). Furthermore, they may reside in the same set of cells or in different sets of cells" (Nigrin, 1993). A pattern is a homonym whenever it can have different meanings (or produce different consequences) in different contexts.
7. Through the gedanken experiments in Chapter 6, it was determined that synonyms and homonyms are poorly represented by most current neural networks. The problem is traced to the manner in which most networks implement competition. In most current networks, competition is implemented by having classifying nodes compete for the right to classify signals on their input lines. However, one of the conclusions of the book is that for synonyms and homonyms to be properly represented, the manner of competition should be changed: it should instead be implemented by having the input links compete for the right to turn on classifying nodes. This was termed "presynaptic competition" and also "link competition." I believe that this shift in emphasis, from the classifying cells competing to the inputs to the classifying cells competing, allows for a better representation of synonymous and homonymic patterns.
8. In discussing this shift in emphasis, Da Costa writes: "Although I find such an approach interesting, it should be noted that the presynaptic model can be understood in terms of conventional neural networks." He then presents a figure illustrating such a transformation. Of course this is true. In the first section of the book, where presynaptic inhibition is used (p. 252), this very point is emphasized in the following italicized sentences: "It is irrelevant to the operation of the circuit whether the interactions are implemented by direct contact between links or whether the interactions are implemented indirectly by signals through interneurons. The only thing of importance is that the inputs to the [classifying] nodes do in fact interact." In addition Da Costa writes: "Although the presynaptic paradigm does provide a novel and elegant way of describing some neural models, it should be borne in mind that it has little or no potential for saving hardware resources or processing time." This is again true, at least initially. Chapter 6.17 discusses the hardware requirements of such networks and concludes that, at least for the initial implementations, hardware requirements for the new networks will be greater than that of traditional networks.
9. However, the point of the chapter was not to design new interactions that cannot be implemented with standard units. (Link interactions are currently implemented with standard shunting equations, see Nigrin, 1994c.) Nor was the point to reduce the processing requirements of a network. The point was to design new circuits that could properly represent information that poses fundamental difficulties for other networks. (The inadequacy of standard approaches is further illustrated in a brief paper [Nigrin 1994b] that is available from the author.) Whether or not this new approach has promise is an issue that I hope will be addressed by other reviewers.
10. After setting up the framework for handling synonyms, the book attempts to apply it to a subproblem of the general case, translation invariant recognition (TIR) and size invariant recognition (SIR). These are subproblems of the general case because in both instances, input representations over different sets of retinal cells must eventually get treated as synonymous. The approach to both of these problems can be separated into three stages. The first stage transforms the problem of SIR to TIR by representing the input patterns with scales sensitive cells. This allows increases (or decreases) in the apparent size of objects to be represented by a translation of the representation to cells representing larger (or smaller) scales. The second stage of the solution uses a circuit that roughly centers an object using multiple layers of routing circuits and selective attention. (It is interesting and probably not coincidental that another approach to pattern centering (Olshausen, Anderson & Van Essen, 1993) likewise requires the modulation of input connections.) The final stage classifies the (almost) centered object.
11. Concerning this approach Da Costa writes: "The proposed neural architecture for implementing pattern centralization seems a clever way to attack such a problem ASSUMING THAT ARTIFICIAL NEURAL NETWORKS MUST BE USED... a remarkable alternative solution... has been provided by nature, which accomplishes it precisely by not solving the problem at all... the human retina is much less tolerant to pattern translation than one might expect (Goldstein 1989); in fact, translation tolerance in the primate visual system is achieved through the coordinated movement of the eyes, which scan the image so as to centre each object to be analysed over the foveal region of the retina. (To be precise, it should be pointed out that the retina itself provides some rather limited translation invariance capability.)"
12. However, a simple home experiment can demonstrate that the object recognition system of humans IS capable of TIR without requiring the centering an object on the fovea. Simply focus your eyes straight ahead and attempt to recognize some unknown object placed to one side of your visual field by a friend (while still focusing your eyes straight ahead). Allowing for the fact that visual acuity is less for nonfoveal regions of the retina than for foveal regions, you will find that object recognition is easily accomplished. Thus, any circuit that purports to model the human visual system must be able to achieve this type of behavior. To increase the neurobiological plausibility of the model, it should be constructed using only neuron-like elements. And, to allow it to work for large problems, the model must scale properly. (If the input field has n nodes, the proposed centering network uses only O(n log n) nodes and connections.) The adequacy or inadequacy of current digital hardware for modeling the network simply has no bearing on these issues.
13. Another possible neural network that attempts to handle SIR, TIR and distortion invariant recognition is the neocognitron (Fukushima, 1988). Unfortunately, by itself, this architecture is insufficient. As noted by Barnard and Casasent (1990), large translations can only be accommodated by reducing the discrimination capability of the network. In addition, if one takes the locality requirement seriously, the number of nodes and connections needed to implement the neocognitron for a full visual field is inordinately large. Da Costa suggests that these problems could be overcome "by incorporating additional mechanisms such as foveal vision and selective attention." However, for the modelling of human perception an object must be recognizable without having to be centred on the fovea. Furthermore, the implementation of selective attention to center objects is precisely what the architecture in the book is intended to accomplish! I have no objection to a neocognitron being used as the classification stage after centering; in fact, this combination was suggested in Nigrin (1992). However, the neocognitron has other serious weaknesses for modeling perception, such as the inability to perform fast learning or to form stable category codes. Thus, its utility for modeling biological recognition systems may be limited.
14. Da Costa also mentions the importance of rotation-invariant recognition. Though this is an important property, I do not believe it is built into the adult visual recognition system in the same manner as TIR and SIR. To recognize this, notice that humans can automatically read text that is shifted or changed in size. However, without prior training, this recognition is not accomplished automatically for rotated text. I believe that rotation-invariant recognition is analogous to the problem of recognizing different views of a three-dimensional object. It is also analogous to the problem of recognizing different instances of the same class (as in recognizing a letter regardless of its font). In all three cases, different chunked representations can form in the classification system for the object's different views, rotations, or instances. Then a later stage in a hierarchy can recognize these different representations as synonymous. This approach will not require too many neurons if the chunked representations can be made insensitive to distortion (see also Seibert & Waxman, 1992; and Blackwell et al. 1992).
15. In conclusion, the book attempts to lay out a very specific set of functional properties that should be achieved by any classification system used for pattern recognition. It then attempts to present a unifying neural network approach by describing and partially implementing a single framework from which a wide variety of pattern recognition problems could be solved.
Barnard, E. & Casasent, D. (1990) Shift Invariance and the Neocognitron. Neural Networks, 3(4):403--410.
Blackwell, K.T., Vogl, T.P., Hyman, S.D., Barbour, G.S. & Alkon, D.L. (1992) A New Approach to Hand-written Character Recognition. In press, Pattern Recognition.
Da Costa, L. (1994) A Nonmystifying Approach to Artificial Neural Networks. PSYCOLOQUY 5(15) pattern-recognition.2.dacosta.
Fukushima, K. (1988) A Neural Network for Visual Pattern Recognition. Computer 1:65-76.
Goldstein, B. (1989) Sensation and Perception. Wadsworth Publishing Company.
Nigrin, A. (1992) A New Architecture for Achieving Translational Invariant Recognition of Objects. In Proceedings of the International Joint Conference On Neural Networks, v. 3, pp. 683-688, Baltimore, MD.
Nigrin, A. (1993) Neural Networks for Pattern Recognition. The MIT Press, Cambridge, MA.
Nigrin, A. (1994a) Precis of Neural Networks for Pattern Recognition. PSYCOLOQUY 5(2) pattern-recognition.1.nigrin.
Nigrin, A. (1994b) Neural Network Representation of Synonymous and Homonymic Patterns. Submitted for publication.
Nigrin, A. (1994c) SONNET 2: A New Neural Network for Pattern Recognition. American University Technical Report CSIS-94-001.
Olshausen, B.A., Anderson, C.H. & Van Essen, D.C. (1993) A Neurobiological Model of Visual Attention and Invariant Pattern Recognition Based on Dynamic Routing of Information. The Journal of Neuroscience, v. 13, n. 11, pp. 4700-4719.
Seibert, M. & Waxman A. (1992) Adaptive 3-D Object Recognition from Multiple Views. IEEE transactions on pattern analysis and machine intelligence. v. 14 n. 2, pp. 107-125.