This Precis provides an overview of the book "Neural Networks for Pattern Recognition." First, it presents a list of properties that the author believes autonomous pattern classifiers should achieve. (These thirteen properties are also briefly discussed at the end.) It then describes the evolution of a self-organizing neural network called SONNET that was designed to satisfy those properties. It details the organization of (1) tutorial chapters that describe previous work; (2) chapters that present working neural networks for the context sensitive recognition of both spatial and temporal patterns; and (3) chapters that reorganize the mechanisms for competition to allow future networks to deal with synonymous and homonymic patterns in a distributed fashion.
2. The book can be subdivided into three major sections. The first section provides an introduction to neural networks for a general audience and presents the previous work upon which SONNET is based. The second section describes the structure of SONNET 1 and presents simulations to illustrate the operation of the network. And the third section describes a reorganization of the competitive structure of SONNET 1 to create more powerful networks that will achieve additional important properties.
3. The first segment consists of Chapters 1 and 2. After presenting a simplified network to introduce the subject to novices, Chapter 1 presents one possible definition for neural networks and an approach to designing them. The chapter then describes many of the fundamental properties that a neural network should achieve when it is being used for pattern classification. These properties are listed in Table 1 (reproduced from Nigrin, 1993) and are each briefly discussed in Appendix A below.
_________________________________________________________________________ | | | A classification system should be able to: | | | | 1) self-organize using unsupervised learning. | | 2) form stable category codes. | | 3) operate under the presence of noise. | | 4) operate in real-time. | | 5) perform fast and slow learning. | | 6) scale well to large problems. | | 7) use feedback expectancies to bias classifications. | | 8) create arbitrarily coarse or tight classifications | | that are distortion insensitive. | | 9) perform context-sensitive recognition. | | 10) process multiple patterns simultaneously. | | 11) combine existing representations to create categories | | for novel patterns. | | 12) perform synonym processing. | | 13) unlearn or modify categories when necessary. | | | | TABLE 1 | |_______________________________________________________________________|
4. I believe that before one can construct (or understand) autonomous agents that can operate in real-world environments, one must design classification networks that satisfy all of the properties in Table 1. It is not easy to see how any of these properties could be pushed off to other components in a system, regardless of whether the architecture is used to classify higher level structures such as sentences or visual scenes, or lower level structures such as phonemes or feature detectors. For example, consider the problem of modeling language acquisition and recognition. It is illuminating to attempt to push off any of the above properties to a subsystem other than the classifying system and still account for human behavior without resorting to a homunculus or to circular arguments.
5. With a description of the goals for the book in hand, Chapter 2 begins the process of describing neural network mechanisms for achieving them. Chapter 2 presents a tutorial overview of the foundations underlying the neural networks in the book. The book presents only those mechanisms that are essential to SONNET. Alternative approaches such as backpropagation, Hopfield networks, or Kohonen networks are not discussed. The discourse begins at the level of the building blocks and discusses basic components such as cells and weights. It then describes some essential properties that must be achieved in short term memory (STM) and long term memory (LTM) and presents architectures that achieve them.
6. Chapter 2 also discusses how to incorporate these architectures into different networks. The two major networks described in the chapter are the ART networks of Carpenter and Grossberg (1987a, 1987b) and the masking field networks of Cohen and Grossberg (1986, 1987). The ART networks completely or partially achieve many important properties. They can self-organize using unsupervised learning; form stable category codes; operate in noise; operate in real-time; perform fast or slow learning; use feedback; and create tight or coarse classifications. The masking field is also an important architecture. It achieves a framework for achieving properties such as context sensitive recognition and simultaneous classification of multiple patterns.
7. After presenting the necessary groundwork, the book begins the presentation of the real-time network called SONNET, which is its main focus. Due to its complexity, the complete network has not yet been fully implemented. Instead, the implemented network contains simplifications that allowed it to be slowly built up and analyzed. These simplifications were also useful to allow the network to be completed within a reasonable time frame. However, they had the drawback of preventing the satisfaction of some important properties that will be achievable by the full network.
8. Chapter 3 presents the basic version of the model called SONNET 1, as it pertains to spatial patterns. This network merged the properties of the ART networks with those of the masking field networks. SONNET 1 either partially or totally achieved all but four of the properties listed in Table 1. (It did not use feedback, form distributed categories, perform synonym processing or unlearn classifications.) After the network is described, simulations are presented that show its behavior. Furthermore, simple improvements are described that could increase network performance.
9. To allow SONNET 1 to achieve these properties, several novel features were incorporated into the network. These included (among others) the following: (1) The network used a non-linear summing rule to allow the classifying nodes to reach decisions in real-time. This non-linear rule was similar to those found in networks using sigma-pi units. (2) A learning rule was used to allow the inhibitory weights to self-organize so that classifying nodes only competed with other nodes that represented similar patterns. This allowed the network to classify multiple patterns simultaneously. (3) Each node encoded two independent values in its output signal. The first output value represented the activity of the cell while the second value represented a confidence value that indicated how well the cell represented the input. The use of two output values allowed the network to form stable categories, even when input patterns were embedded within larger patterns.
10. Chapter 4 incorporates SONNET 1 into a framework that allows it to process temporal patterns. This chapter has several aspects. First, it shows how to design input fields that convert temporal sequences of events into classifiable spatial patterns of activity. Then, it describes how the use of feedback expectancies can help segment the sequences into reasonable length lists, and allow arbitrarily long sequences of events to be processed.
11. After describing the network, Chapter 4 presents simulations that show its operation. One of the simulations consisted of presenting the following list to the network, where each number refers to a specific input line. The list was presented by activating each input line for a constant period of time upon the presentation of its item. After the last item in the list was presented, the first item was immediately presented again, with no breaks between any of the items.
0 1 2 3 4 5 24 25 26 6 7 8 9
0 1 2 10 11 12 13 24 25 26 14 15 16
0 1 2 17 18 19 24 25 26 20 21 22 23
12. In this list, items (0,1,2) and (24,25,26) appear in three different contexts. Because of this, the network learned to create categories for those lists and to segment them accordingly. Thus, it learned in a real-time environment. It was also clear that it performed classifications in real-time since each of the lists was classified approximately 2 items after it had been fully presented. For example, if the list 22 23 0 1 2 3 4 5 6 was presented, the list (0,1,2) would be classified while item 4 or 5 was being presented. Simulations have shown that the amount of equilibration time needed for classification would not increase significantly, even if multiple similar patterns were classified by the network.
13. Chapter 5 continues to discuss the classification of temporal patterns. (However, many elements in this chapter are also applicable to purely spatial patterns.) The chapter shows how to cascade multiple homologous layers to create a hierarchy of representations. It also shows how to use feedback to bias the network in favor of expected occurrences and how to use a nonspecific attention signal to increase the power of the network. As is the case with the networks in later chapters, these proposed modifications are presented but not simulated.
14. One major limitation of the networks presented in Chapters 4 and 5 is that items can be presented only once within a classified list. For example, the list $ABC$ can be classified by the network, but the list $ABA$ cannot, since the $A$ occurs repeatedly. This deficiency is due to the simplifications that were made in the construction of SONNET 1. To overcome this and other weaknesses, the simplifications needed to be removed.
15. This is accomplished in Chapter 6, which presents a gedanken experiment analyzing the way repeated items in a list could be properly represented and classified. The chapter begins by showing that multiple representations of the same item are needed to allow the network to unambiguously represent the repeated occurrence of an item. It then analyzes methods by which the classifying system could learn to classify lists composed of these different representations.
16. During this gedanken experiment, it quickly became clear that the problem of classifying repeated items in a list was actually a subproblem of a more general one, called the synonym problem: Often, different input representations actually refer to the same concept and should therefore be treated by classifying cells as equivalent. However, the problem is complicated by the fact that sometimes different patterns refer to the same concept while sometimes the same pattern may have multiple meanings (homonyms).
17. To address the synonym problem, Chapter 6 presents a way to radically alter the method of competition between categories. In SONNET 1 (as in most competitive networks), classifying nodes compete with each other for the right to classify signals on active input lines. Conversely, in the altered network, it is the input lines that will compete with each other, and they will do so for the right to activate their respective classifying nodes. The principles in Chapter 6 are far and away the most important new contribution in this book.
18. After showing how synonyms could be learned and represented, Chapter 6 also discusses general mechanisms for creating distributed representations. These mechanisms were designed to allow existing representations to combine in STM (short-term memory) to temporarily represent novel patterns. They were also designed to allow the novel categories to be permanently bound in LTM (long-term memory).
19. After establishing the new mechanisms and principles in Chapter 6, these mechanisms are used in Chapter 7 to create specific architectures that tackle previously unsolved problems. The first section discusses the first implementation of SONNET that uses competition between links rather than nodes; it and shows how multiple patterns could be learned simultaneously. To complement the discussion in the previous chapter, the discussion here is as specific as possible (given that the network was yet to be implemented). The second section discusses how the new formulation could allow networks to solve the twin problems of translation and size invariant recognition of objects. This shows how the new mechanisms could be used to solve an important previously unresolved issue.
20. Finally, Chapter 8 concludes the book. It describes which properties have already been satisfied by SONNET 1, which properties can be satisfied by simple extensions to SONNET 1, and which properties must wait until future versions of SONNET are implemented. This chapter gives the reader a good indication of the current state of the network and also indicates areas for future research.
21. The following briefly summarizes thirteen properties that SONNET is meant to satisfy. Although it is possible to find examples in many different areas to motivate each of the following properties, the examples are mainly chosen from the area of natural language processing. This is done because the problems in this area are the easiest to describe and are often the most compelling. However, the reader should keep in mind that equivalent properties also exist in other domains and that, at least initially, SONNET is meant to be used primarily for lower level classification problems.
22. The first property is that a neural network should self-organize using unsupervised learning. It should form its own categories in response to the invariances in the environment. This allows the network to operate in an autonomous fashion and is important because in many areas, such as lower level perception, no external teacher is available to guide the system. Furthermore, as shown in the ARTMAP network (Carpenter, Grossberg, and Reynolds, 1991), it is often the case that if a network can perform unsupervised learning then it can also be embedded in a framework that allows it to perform supervised learning (but not the reverse).
23. The second property is that a neural network should form stable category codes. Thus, a neural network should learn new categories without degrading previous categories it has established. Networks that achieve this property can operate using both fast and slow learning (see fifth property). Conversely, those that do not are restricted to using slow learning. In addition, networks that don't form stable category codes must shut off learning at some point in time to prevent the degradation of useful categories.
24. The third property is that neural networks should operate in the presence of noise. This is necessary to allow them to operate in real-world environments. Noise can occur in three different areas. It can be present within an object, within the background of an object, and within the components of the system. A network must handle noise in all of these areas.
25. The fourth property is that a neural network should operate in real-time. There are several aspects to this. The first and most often recognized is that a net must equilibrate at least as fast as the patterns appear. However, there are several additional aspects to this property. First, in many applications, such as speech recognition and motion detection, a network should not equilibrate too rapidly, but at a pace that matches the evolution of the patterns. Second, in real-world environments, events do not come pre-labeled with markers designating the beginnings and endings of the events. Instead, the networks themselves must determine the beginning and end to each event and act accordingly.
26. The fifth property is that a neural network should perform fast and slow learning. A network should perform fast learning to allow it to classify patterns as quickly as a single trial when it is clear exactly what should be learned and it is important that the network learn quickly. (For example, one should not have to touch a hot stove 500 times before learning one will be burnt.) Furthermore, a network should also perform slow learning to allow it to generalize over multiple different examples.
27. The sixth property is that a neural network should scale well to large problems. There are at least two aspects to this property. First, as the size of a problem grows, the size of the required network should not grow too quickly. (While modularity may help in this respect, it is not a panacea, because of problems with locality and simultaneous processing.) Second, as the number of different patterns in a training set increases, the number of required presentations for each pattern (to obtain successful classifications) should not increase too rapidly.
28. The seventh property is that a neural network should use feedback expectancies to bias classifications. This is necessary because it is often ambiguous how to bind features into a category unless there is some context with which to place the features.
29. The eighth property is that a neural network should create arbitrarily coarse or tight classifications that are distortion insensitive. Patterns in a category often differ from the prototype (average) of the category. A network should vary the acceptable distortion from the prototype in at least two ways. It should globally vary the acceptable overall error. It should also allow different amounts of variance at different dimensions of the input pattern (the different input lines). This would allow the network to create categories that are more complex than just the nearest neighbor variety.
30. The ninth property is that a neural network should perform context-sensitive recognition. Two aspects of this will be discussed here. First, a network should learn and detect patterns that are embedded within extraneous information. For example, if the patterns SEEITRUN, ITSAT, and MOVEIT are presented, a network should establish a category for IT and later recognize the pattern when it appears within extraneous information. The second aspect occurs when a smaller classified pattern is embedded within a larger classified pattern. Then, the category for the smaller pattern should be turned off when the larger pattern is classified. For example, if a network has a category for a larger word like ITALY, then the category for IT should be turned off when the larger word is presented. Otherwise the category for IT would lose much of its predictive power, because it would learn the contexts of many non-related words such as HIT, KIT, SPIT, FIT, LIT, SIT, etc.
31. The tenth property is that a neural network should process multiple patterns simultaneously. This is important, because objects in the real world do not appear in isolation. Instead, scenes are cluttered with multiple objects that often overlap. To have any hope of segmenting a scene in real time, multiple objects often need to be classified in parallel. Furthermore, the parallel classifications must interact with one another, since it is often true that the segmentation for an object can only be determined by defining it in relation to other objects in the field. (Thus, it is not sufficient to use multiple stand-alone systems that each attempt to classify a single object in some selected portion of the input field.) The easiest modality in which to observe this is continuous speech, which often has no clear breaks between any words. (However, analogous situations also occur in vision.) For example, when the phrase ALL TURN TO THE SPEAKER is spoken, there is usually no break in the speech signal between the words ALL and TURN. Still, those words are perceived, rather than the embedded word ALTER. This can only be done by processing multiple patterns simultaneously, since the word ALTER by itself would overshadow both ALL and TURN.
32. The eleventh property is that a neural network should combine existing representations to create categories for novel patterns. These types of representations are typically called distributed ones. A network must form temporary representations in short term memory (STM) and also permanent iones in long term memory (LTM). Distributed representations are useful because they can reduce hardware requirements and also allow novel patterns to be represented as a combination of constituent parts.
33. The twelfth property is that a neural network should perform synonym processing. This is true because patterns that have entirely different physical attributes often have the same meaning, while a single pattern may have multiple meanings (as in homonyms). This is especially recognized in natural language, where words like "mean" and "average" sometimes refer to the same concept, and sometimes do not. However, solving the synonym problem will also solve problems that occur in the processing of lists composed of repeated occurrences of the same symbol (consider the letters "a" and "n" in the word "banana"). This follows because the different storage locations of a symbol can be viewed as (exact) synonyms for each other and handled in exactly the same way as the general case. Synonym representation is also necessary in object recognition, manifesting itself in several different ways. First, it is possible for multiple versions of the same object to appear within a scene (similar to the problem of repeated letters in a word). Second, since an object may appear completely when viewed different from different perspectives, it is important to map the dissimilar representations of the object onto the same category. Finally, it is also possible for an object to appear in different portions of the visual field (translation-invariant recognition) or with different apparent sizes (size-invariant recognition). Despite the fact that in both cases the object will be represented by entirely different sets of cells, a network should still classify the object correctly.
34. The thirteenth property is that a neural network should unlearn or modify categories when necessary. It should modify its categories passively to allow it to track slow changes in the environment. A network should also quickly change the meanings for its categories when the environment changes and renders them either superfluous or wrong. This property is the one least that ius discussed in the book, because it is possible that much unlearning could take place under the guise of reinforcement learning.
APPENDIX: Table of Contents
1 Introduction 2 Highlights of Adaptive Resonance Theory 3 Classifying Spatial Patterns 4 Classifying Temporal Patterns 5 Multilayer Networks and the Use of Attention 6 Representing Synonyms 7 Specific Architectures That Use Presynaptic Inhibition 8 Conclusion
Appendices
Carpenter, G. and Grossberg, S. 1987a. A Massively Parallel Architecture for a Self-organizing Neural Pattern Recognition Machine. Computer Vision, Graphics, and Image Processing, 37:54--115.
Carpenter, G. and Grossberg, S. 1987b. ART 2: Self-organization of Stable Category Recognition Codes for Analog Input Patterns. Applied Optics, 26(23):4919--4930.
Carpenter,G., Grossberg, S., and Reynolds, J. 1991. ARTMAP: Supervised Real-time Learning and Classification of Nonstationary Data by a Self-organizing Neural Network. Neural Networks, 4(5):565-588.
Cohen, M. and Grossberg, S. 1986. Neural Dynamics of Speech and Language Coding: Developmental Programs, Perceptual Grouping, and Competition for Short-term Memory. Human Neurobiology, 5(1):1--22.
Cohen, M. and Grossberg, S. 1987. Masking Fields: a Massively Parallel Neural Architecture for Learning, Recognizing, and Predicting Multiple Groupings of Data. Applied Optics, 26:1866--1891.
Nigrin, A. 1993. Neural Networks for Pattern Recognition. The MIT Press, Cambridge MA.