This is a response to some of the concerns raised by Pickering (1994) in his review of "Neural Networks for Pattern Recognition" (Nigrin, 1993). Topics discussed include: (1) the use of gedanken experiments; (2) reinforcement learning; and (3) the attentional reset mechanism.
1. Pickering (1994) makes a number of interesting comments concerning Nigrin (1993). I am in agreement with the spirit of most of his comments although I disagree with some of the specifics. Below, I will attempt to answer some of the more important criticisms.
2. One of the primary means of discourse in the book was the gedanken, or thought experiment, which was used both to guide the construction of the networks and to explain why certain choices were made. As to their use, Pickering (paragraph 5) writes: "[they are] a source of both strength and weakness within Nigrin's book... [A] serious flaw with gedanken experiments (GEs) relates to Nigrin's repeated use of arguments concerning how to equip his system with desirable properties (or remove undesirable ones). When guided solely by the logical considerations of GEs, one may erroneously decide which properties should be in or out and thus waste much energy on achieving the desired result. The history of neural networks provides the most graphic illustration of this problem [with the XOR problem]."
3. I am in complete agreement with Pickering about the dangers of using GEs. As I point out on p. 18: "If major modifications are required each time new properties are added, it is likely that the problem has been broken down in an inconvenient manner and that the underlying hypotheses should be reexamined... it is possible that the properties selected are not easily achievable by a neural network and that the wrong tool has been selected for the job. One example of this is simple arithmetic operations. Digital computers work extremely well on this type of problem, and it may be the case that neural networks are simply ill suited for this type of task."
4. I do not agree, however, that the design of the networks was guided solely in a top down fashion (and I am in a position to know). Pickering may have received this mistaken impression because of the structure of the book, which begins by listing a set of desirable behaviors and continues by presenting neural network designs that attempt to achieve the behaviors. This was a rhetorical decision, however, and did not fully reflect the way the networks were designed. Instead, only a few basic properties were chosen a priori and then the rest emerged from an attempt to design a network that could learn to segment temporal sequences. The only a priori constraints were: (1) the network should operate on-line, since humans are forced to do this; (2) the network should use unsupervised learning; and (3) the network should perform fast learning. I insisted on using unsupervised learning because I do not believe that other types of learning (like reinforcement learning or supervised learning) can completely account for behaviors such as language learning. My insistence on fast learning arises from my belief that the classification subsystem is able to achieve this property. For example, humans are able to learn telephone numbers in one or a few trials (especially when sufficiently motivated).
5. The rest of the properties emerged from my search for a network that could handle the problem of segmenting temporal sequences. Properties like the requirement for context sensitive classification emerged because when given an unsegmented list, the network must handle patterns that are embedded within larger ones. The entire discussion on synonyms and homonyms emerged from the simple requirement of wanting the network to represent items that are repeated within a category, as in the so called "banana problem" (Marshall, 1994). The fact that Pickering believes that SONNET has applicability to much more than just language processing is an indication that these properties may have been chosen well enough to provide a unifying perspective. I believe that SONNET may be useful as a general purpose pattern recognizer in many different domains, such as in object recognition, as a component in a reinforcement learning system, or as a component within a supervised learning system.
6. I am in complete agreement with Pickering in wanting to see more behavioral data to make the model more applicable to psychology (and I agree that the book is lacking in this respect). Our differences lie in our priorities. Pickering places a higher priority on determining how well the existing models represent actual cognitive behavior, whereas I have placed a higher priority on achieving more capabilities with the network. I believe that the most important contribution I can make is to eliminate deficiencies of the SONNET 1 model by implementing SONNET 2 (see Nigrin, 1994a). However, it is my sincere hope that others will examine the cognitive applicability of the model (as did Page, 1994). If the model achieves enough functionality, this is a good possibility.
7. Finally, I do not think there is anything wrong with having the goal of "perfect functioning" for a neural network, IF one realizes that certain behaviors may be difficult or impossible to achieve in all possible situations. As the footnote on p. 79 reads: "This is a good example where data from humans can help in neural network design. If this data did not exist, one might spend an inordinate amount of time trying to design a network that never confused order information. However, the data from human subjects shows that this property may be very difficult to achieve and that a designer might be better off by avoiding it for the time being."
8. Although Pickering believes that I have "a tendency to go too far in ensuring perfect functioning", I believe that it is safer to go too far in that direction than in the direction of too quickly giving up on desirable behaviors. Whereas the first course carries the risk of floundering around for long periods of time, there is a greater chance that one has made a correct choice when one does give up on some behavior. In addition, one is more likely to uncover important constraints that illustrate why certain behaviors are fundamentally difficult to achieve. Finally, taking the second course carries the risk (more serious, in my opinion) of working on problems that do not reach to the heart of important issues.
9. Another fault that Pickering finds with the book concerns my analysis of the attentional reset mechanism. He writes: "Although Nigrin keeps the vigilance concept from ART, he is strongly critical of the reset mechanism (Nigrin, 1993, pp. 126-128). I am more kindly disposed to this mechanism on psychological grounds; I have suggested (Pickering, 1994) that, along with the vigilance parameter, it could represent an implementation for a fundamental brain system believed to control behaviour under conditions of mismatch... the behavioural inhibition system (Gray, 1982)."
10. In fact, although I strongly criticize the attention reset mechanism when used to achieve stability FOR UNSUPERVISED LEARNING, I am a strong proponent of its use in other areas. For example, a footnote on p. 128 reads: "Although the attentional reset mechanism is not sufficient to form stable category codes, I believe that it will be an important component in future versions of SONNET. The reset mechanism can be used, along with a gated dipole mechanism (Grossberg, 1980), to resegment an input pattern if the network improperly segments the input on its first or succeeding tries." The attentional reset mechanism is also used in chapters 7 and 8 for various tasks.
11. One problem with having unsupervised networks use the reset mechanism to achieve stable categories is that the networks run into problems with unsegmented patterns. Problems arise because there is a global comparison between the entire input pattern and the patterns expected by active category representations. For example, suppose some node xi has classified the word DOG and the input pattern QRYDOGIKPT is presented. Then, if xi activates, the network will compare QRYDOGIKPT against the pattern expected by xi (DOG). Since the extraneous items cause those patterns to be different, a mismatch will result, and xi will be reset unless the vigilance parameter has been set to very small levels. Unfortunately, with low levels of vigilance, the attentional reset will not be easily triggered, not only for superset patterns like DOGMATIC, but also for distortions of the xi pattern like GOD, DONE, or DO. (A deficiency of my previous critique of attentional reset was that only superset patterns were explicitly mentioned and thus the full scope of the problem might not have been apparent.) Thus, under unsupervised conditions, either the network must always be presented presegmented patterns, (see Nigrin, 1993, Section 3.1.3 for why this is undesirable) or the vigilance parameter must be set in a way that causes the network to behave just as if it had (almost) no attentional subsystem at all. The latter option is undesirable since the serious instability that would then result was one of the primary reasons Grossberg incorporated an attentional reset in the first place.
12. Pickering presents a rational for how the reset mechanism could be used to create stable categories: He writes (paragraph 9): "This problem assumes that the operating level of vigilance is fixed. There are many ways in which it will change. First, vigilance may be equated to tonic arousal, which is constantly fluctuating. Second, Grossberg has repeatedly stressed that vigilance will be affected by feedback from the reinforcement learning circuitry, which is itself driven by the categorized stimulus inputs of ART networks. If an action, under the control of a particular category node, produces an aversive outcome, then feedback about such outcomes raises vigilance, increasing the chances that the node will be reset when faced with the same stimulus array. This will allow a new category node to control behaviour."
13. The problem with this analysis is that it DEPENDS on the use of reinforcement learning to insure stability, and thus apparently concedes that stability cannot be achieved within the framework of unsupervised learning. And although I have no empirical data to show that stable categories can form under unsupervised learning alone, I have the firm conviction that this is possible. Hence, I was forced to design new mechanisms for stability that could operate more adequately under unsupervised learning. These mechanisms depend on each category creating a confidence measure for how well it represents the input pattern. Emerging categories are allowed to form fully only when their confidence measures are higher than those of existing categories. (This confidence can be enhanced by feedback from other categories. It is not clear whether this mechanism, or the reinforcement learning mechanism suggested by Pickering, is more adequate to explain certain cases where humans do not form larger categories to override the representations of smaller categories.)
14. Let me reiterate: I DO believe that attentional reset is an important component in human systems. However, I do not believe it is sufficient to stabilize categories under unsupervised learning. Since I believe such stabilization is both possible and important, I have attempted to design new mechanisms for this task.
15. Pickering and I seem to agree on most of the fundamental issues regarding the design of neural networks. We may disagree on boundary conditions, however, such as when a problem can be handled solely by unsupervised learning and when reinforcement learning becomes necessary. (I have a tendency to favor less supervision whenever possible and perhaps do go too far in this respect.) We also seem to have different perspectives on the development of neural networks, with me placing a higher emphasis on increasing their capabilities, and Pickering placing a higher emphasis on increasing their explanatory adequacy for human behavior.
Gray, J.A. (1982) The neuropsychology of anxiety. Oxford: Oxford University Press.
Grossberg, S. (1980) How does the brain build a cognitive code. Psychological Review 87:1-51.
Marshall, J. (1994) Synonyms, Embedding, Segmentation, and the Banana Problem. PSYCOLOQUY 5(32) pattern-recognition.5.marshall.
Nigrin, A. (1993) Neural Networks for Pattern Recognition. The MIT Press, Cambridge, MA.
Nigrin, A. (1994a) Precis of Neural Networks for Pattern Recognition. PSYCOLOQUY 5(2) pattern-recognition.1.nigrin.
Nigrin, A. (1994b) Context Sensitivity and Reinforcement Learning: A Reply to Rickert. PSYCOLOQUY 5(42) pattern-recognition.7.nigrin.
Page, M.P.A. (in press) Modeling the perception of musical sequences with self-organising neural networks. Connection Science.
Pickering, A. (1994) Neural Nets Cannot Live by Thought (Experiments) Alone. PSYCOLOQUY 5(35) pattern-recognition.6.pickering.