Although Nigrin's (1993) book contains major developments in the use of adaptive resonance neural networks, the exclusive reliance on gedanken experiments and the consequent lack of simulations of real experimental data will do little to improve the low "take up" of adaptive resonance ideas amongst psychologists.
1. The major contributions of Stephen Grossberg (e.g., 1982, 1987) to the field of neural network (NN) research -- adaptive resonance theory (ART), and masking fields in particular -- have received due recognition in certain quarters only. Whilst his approach is often the NN method of choice amongst computer scientists, mathematicians, and engineers, human cognitive psychologists and animal learning theorists have turned to more recent and less comprehensive frameworks, especially back propagation. Grossberg's own rather impenetrable writing style has been partly to blame as, paradoxically, has the hierarchical evolution of the ideas over more than two decades. This commendable approach has the major drawback of requiring the reader repeatedly to track back to earlier writings for the reasons underlying, for example, the choice of a particular equation or architecture. Nigrin's book (Nigrin, 1993; Precis in PSYCOLOQUY, 1994) is important, therefore, at a pedagogical level, providing (as it does) one of the clearest overviews of the development and content of ART that I have yet read. More than that, however, Nigrin's SONNET has meaningfully extended ART into new domains, whilst preserving its philosophy and style (see Section II).
2. Among the psychologists who do know of the Grossberg tradition, there is often a criticism that the models are "unpsychological"; in my view this is a mistaken critique but is, I believe, encouraged by the almost exclusive use of gedanken experiments rather than intermixing these with the simulation of real behavioural data. I suggest, below, that Nigrin needs to avoid creating this same illusion in writing about SONNET.
3. This commentary may appear to voice harsh criticisms; these must be set against my strongly sympathetic general response to the book. This juxtaposition arises because I believe that ART-like ideas (such as SONNET) are so important for psychology, and because the "public relations" of these ideas have been relatively poor in the past.
4. As already noted, the major intellectual tool in the development of ART, and of SONNET, is the gedanken experiment. These experiments have an impeccable pedigree within science (vide Einstein) but are a source of both strength and weakness within Nigrin's book. In terms of Marr's (1982) hierarchy of levels of analysis, the strength of the gedanken experiments lies in providing a series of basic computational constraints on the choice of the neural algorithm (constraints that are also constrained, as with Grossberg's models, by considerations of neural plausibility at the implementational level). The gedanken experiment can, in my view, be over-extended; at the points when this occurs, Nigrin's book becomes briefly unreadable. Clarity would be enhanced through the (qualitative) simulation of real behavioural data, not least because the endless lists of ABs, CABs and ABCs would be made into concrete examples. (It was a real relief, on page 254 for example, when Nigrin replaced "the items A, B, C and D" with the words small, glass, little, and cup," respectively.)
II.1 Why Neural Nets Cannot Live By Thought Alone
5. A much more serious flaw with gedanken experiments (GEs) relates to Nigrin's repeated use of arguments concerning how to equip his system with desirable properties (or remove undesirable ones). When guided solely by the logical considerations of GEs, one may erroneously decide which properties should be in or out and thus waste much energy on achieving the desired result. The history of neural networks provides the most graphic illustration of this problem. The development of perceptron-like ideas was virtually abandoned for over two decades because of the inability of the two-layered model to solve XOR-type problems. Why was this important? Certainly it was not because real human behaviour demanded the ability to compute XOR; indeed, the extent to which this is a useful computational function for human beings appears minimal (see Shanks, 1991). Of course, the goals of different NN practitioners may differ, but for creating a functional model of the human (or mammalian) brain a basis in what human beings (can) actually do seems a sensible prerequisite.
II.2 The Imperfect Brain
6. The above problem with Nigrin's use of GEs is clearest when he attempts to "perfect" his model brain. A GE might reveal that a particular function, with adaptive significance, should be present. However, it is all too easy to be lured into the mistaken view that this function must be accurately computed in all circumstances. Nigrin has a tendency to go too far in ensuring perfect functioning (see sections II.3 and II.4 below; this suggests to me that he comes from a computer science/engineering background -- engineers' machines and computer programmers' software must operate perfectly in all circumstances). Visual illusions are an obvious major example where the brain's normally adaptive mechanisms can be fooled into error-states by special, low-frequency situations. Not only is building a model which can cope with these situations psychologically wrong, it risks losing sight of the wood (mostly perfect function) amongst the trees (rare errors).
II.3 Limits of Perfection: Part 1
7. Although Nigrin keeps the vigilance concept from ART, he is strongly critical of the reset mechanism (Nigrin, 1993, pp. 126-128). I am more kindly disposed to this mechanism on psychological grounds; I have suggested (Pickering, 1994) that, along with the vigilance parameter, it could represent an implementation for a fundamental brain system believed to control behaviour under conditions of mismatch -- the behavioural inhibition system (Gray, 1982). Thus, I am forced to take Nigrin's objections seriously.
8. Nigrin's example (modified to fit my choice of words) involves a system which has learned the item ABC (e.g., the sounds of the word "dog") and has to deal with a superset item ABCD (e.g., "doggie"). Nigrin considers the problem that reset occurs when ABCD is presented, and concludes that a training set consisting of ABCD, ABCE, ABCF, ABCG, etc. would, in this case, continually produce category nodes for ABC but not for any of the specific training examples. My response is to ask whether such a training set is remotely likely in nature and therefore whether we should abandon a useful concept on this basis? More likely, the child who is familiar with one word sound (e.g., dog) would be exposed to one of the superset items (e.g., doggie) with greater frequency than the others (e.g., dogged) until a category code for that exemplar had been formed, thus vitiating the problem altogether.
9. Nigrin's other problem case is when the superset item fails to produce a reset. Thus the word "doggie" could never develop a category node if "dog" were already known. This problem assumes that the operating level of vigilance is fixed. There are many ways in which it will change. First, vigilance may be equated to tonic arousal, which is constantly fluctuating. Second, Grossberg has repeatedly stressed that vigilance will be affected by feedback from the reinforcement learning circuitry, which is itself driven by the categorised stimulus inputs of ART networks. If an action, under the control of a particular category node, produces an aversive outcome, then feedback about such outcomes raises vigilance, increasing the chances that the node will be reset when faced with the same stimulus array. This will allow a new category node to control behaviour. Imagine a child, familiar with the sound "dog," hearing "doggie," but not resetting the "dog" category node. The child produces a response linked to the "dog" category node, saying "dog" to the parent. The parent, however, wants the child to say "doggie." Aversive feedback from the parent will raise the child's vigilance, encouraging reset, and ultimately this will lead to an appropriate response being hooked up to the new category node.
10. With such sensible accounts available for behaviour in the real world, I don't consider the reset mechanism to be under serious challenge. However, many of the "problems" suggested by GEs and then considered by Nigrin are more genuine than the foregoing examples. However, before modifying a NN to cope with a real problem, one must specify the boundary conditions in which the problem occurs. As is argued in the next section, the GE method, by stressing that a problem CAN occur, tends to lead to a neglect of considerations of WHEN the problem will occur.
II.4 Limits of Perfection: Part 2
11. Many psychological experiments are based on the special circumstance where the information-processing device fails, as such situations can reveal the underlying mechanisms quite starkly. To take just one example: LATENT INHIBITION is a phenomenon from the animal learning laboratory (and recently intensively studied in human subjects, too). A conditioned stimulus (CS) is presented a number of times to an animal without consequence. Learning does occur, however, as attentional mechanisms eventually disengage from this stimulus -- it isn't worth paying attention to. This is an adaptive function. Psychologists exploit the downside of this function by seeing what happens when the CS suddenly becomes predictive of a reinforcer (or unconditioned stimulus, US). Learning is inhibited because the animal has to overcome prior learning that the CS is not followed by anything (and is thus unimportant).
12.If one considers the binding of CS and US in a latent inhibition experiment as an example of parsing (an idea suggested to me by Mike Page), then prior parsing of the CS alone seems to inhibit parsing of the CS-US pair as a unit. This example therefore conflicts with the idea that there is always a "temporal chunking problem." This "problem," considered by Nigrin at length (starting on p. 80), arises because simple models would never allow the higher order chunk (e.g., the CS-US unit) to form. The problem is solved in the masking fields adopted by SONNET, but the solution neglects the fact that the problem does not arise in situations where there is difficulty in forming higher order chunks (e.g., latent inhibition in my analysis). Nigrin acknowledges but does not adequately address this difficulty (in a language context) in his concluding remarks (p. 365).
13. Section II.4 revealed, I hope, that the old chestnut -- the interdisciplinary cross-fertilisation of ideas -- is sometimes a reality. A consideration of ideas from "reinforcement learning" (which Nigrin conceded was a major omission from the book) helped cast new light on ideas from the study of language (e.g., the "temporal chunking problem") and vice versa. It is therefore a pity, and a limitation on the development of SONNET, that Nigrin concentrates, almost exclusively, on GEs and examples from language processing. (These remarks do not imply that SONNET is a system of limited power; Page, in press, has shown that a development of SONNET achieves considerable success in modelling musical expectancy.) There are certainly many areas of memory research where ART-like networks can make important contributions (Pickering, 1994). The excellent treatment of the representation and processing of synonyms in the latter part of Nigrin's book, where novel mechanisms of presynaptic inhibition were developed and employed, would have been even better if the ideas had been made concrete by simulations of experimental data, particularly if these data had been drawn from a variety of areas in addition to language (e.g., memory).
Gray, J.A. (1982). The neuropsychology of anxiety. Oxford: Oxford University Press.
Grossberg, S. (1982). Studies of mind and brain: neural principles of learning, perception, development, cognition, and motor control. Boston: Reidel Press.
Grossberg, S. (1987). Competitive learning: from interactive activation to adaptive resonance. Cognitive Science, 11, 23-63.
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: W.H. Freeman and Co.
Nigrin, A. (1993). Neural Networks for Pattern Recognition. Cambridge, Mass.: The MIT Press.
Nigrin, A. (1994). Precis of: Neural Networks for Pattern Recognition. PSYCOLOQUY, 5(2), pattern-recognition.1.nigrin.
Page, M.P.A. (in press). Modeling the perception of musical sequences with self-organising neural networks. Connection Science.
Pickering, A.D. (1994, August). Context: Netting such a big fish is an ART. Paper to be presented at the Third International Conference on Practical Aspects of Memory. University of Maryland.
Shanks, D.R. (1991). Categorisation by a connectionist network. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 433-443.