Green raises a number of questions regarding the role of "connectionist" models in scientific theories of cognition, one of which concerns exactly what it is that units in artificial neural networks (ANNs) stand for, if not specific neurones or groups of neurones, or indeed, specific theoretical entities. In placing all connectionist models in the same basket, Green seems to have ignored the fundamental differences which distinguish classes of models from each other. In this commentary, we address the issue of distributed versus localised representations in ANNs, arguing that it is difficult (but not impossible) to investigate what units stand for in the former case, but that units do correspond to specific theoretical entities in the latter case. We review the role of localised representations in a neural network model of a semantic system in which each unit corresponds to a letter, word, word sense, or semantic feature, and whose dynamics and behaviour match those predicted from a cognitive theory of skilled reading. Thus, we argue that ANNs might be useful in developing general mathematical models of processes for existing cognitive theories that already enjoy empirical support.
2. Quiet developments during the 1970's continued to show the utility of ANNs as process models in the perceptual domain, with Fukushima's Cognitron (Fukushima, 1975), and in the cognitive domain, with Kohonen's (1977) associative memory model. Unsupervised learning systems, which are statistically equivalent to orthogonal decomposition (principal components analysis), have also been used as process models for cognition (Barlow, 1989); these which are functionally quite different from nonlinear associative models such as back-propagation (McClelland et al., 1986). From a different perspective completely, Hopfield (1982) used methods from statistical physics to demonstrate that ensembles of simple neural-like units could develop collective, computational properties.
3. Two facts emerge from this abbreviated review of ANN functional architectures: first, not all ANNs were created equal, or even with the same purpose in mind. Second, what these models do share is a desire to exploit the mathematical or statistical properties of these networks as process models or generalised explanations of how a goal might be achieved. The point we wish to make is that in not considering any specific example or class of ANN in his target article, and simply labelling all ANN models as "connectionist models, Green (1998) has ignored the important differences in both the motivation for using ANN models and the way in which cognitive scientists have actually used them in the past.
4. In a key objection, Green expresses concern that the units in ANNs as cognitive models do not have corollaries in scientific theories of cognition, and often appear to be quite arbitrary. We share this concern with respect to the development of ANNs as cognitive models in isolation from cognitive theories. However, in some applications of ANNs, it is this ubiquity which makes ANNs useful in the same way that polynomial regression can be useful in modelling nonlinearities in data whose underlying structure is unknown. Green also expresses concern about specifying the number of units and their functional relationship to other units in order to complete a task. It is often difficult to investigate what units represent in this context, but it is not impossible. One suggestion is that the contribution of units in a network can be assessed by examining how variance is distributed, and ranking these measures in a "scree plot." This method is commonly used in statistics to assess the relative contribution of eigenvectors, where the slope of the eigenvalues can be used to determine how distributed a representation is. In a "parallel distributed processing" system, variance should ideally be equally distributed, although as Watters and Tolhurst (1998) discovered, some perceptual neural networks actually create quite localised representations in terms of this variance distribution.
5. Alternatively, some ANN architectures are explicitly localised in their representations: each unit in the network corresponds to a theoretical entity which is located within a scientific theory of cognition. Hypotheses can be made about specific units and their relationship to other units, and experiments can be conducted to test these hypotheses. This does not seem to be an alternative which Green has considered. One example is the semantic system (Patel, 1996) of the dual-route theory of skilled of word reading (Coltheart et al. 1993), which uses some twenty years of empirical data on word and non-reading to characterise the processes which lead to normal and abnormal reading. In the dual-route model, there is an explicit division between lexical and non-lexical (i.e., rule-based) processes for the mapping of orthography to phonology, with the semantic system forming part of the lexical route. The semantic system is implemented as a competitive neural network composed of localised representations of letters, words, word senses, and semantic features, and is used to (a) provide a process model of semantic processing occurs during print to speech conversion in skilled reading and (b) to confirm the structure of the representational model derived from theory. The influence of the semantic system on the lexical route is not always apparent in normal reading, although in some cases (such as the pronunciation of homographs) word meanings are accessed to retrieve the correct word sense. However, the influence of the semantic system is quite apparent in some reading disorders such as deep dyslexia, where a semantically related word might be pronounced when a target word is read (e.g., CAT is pronounced as DOG).
6. One example of the uses of competitive process models such as localised ANNs in further developing theories of cognition is to provide an account of relative processing times and response accuracy in normal and abnormal reading processes. This was certainly the case with a recent application of the semantic system to modelling semantic processing errors which are characteristic symptoms of Parkinson's Disease (Watters & Patel, 1998). By operationalising dopaminergic transmission in the frontal cortex as lateral inhibition between the representation of different word senses, we could simulate errors in selecting the correct word sense, supporting the longer latency responses predicted from empirical data (Gurd, 1996). The competitive nature of processing in the network also made it easy to perform experiments on the internal (i.e., cognitive) structure of the local representations in the model, such as the relationship between word senses and their semantic components; this has yielded useful predictions for future human experiments. We believe that this example has demonstrated the utility of the semantic system of the DRC as a localised ANN in developing a general mathematical model of competitive processes to operationalise a cognitive theory which has broad empirical support
This work is funded by an Australian Research Council grant. Conversations with Dr. Malti Patel and Professor Max Coltheart were very enlightening, and their comments are gratefully acknowledged.
Barlow, H.B. (1989). Unsupervised learning. Neural Computation, 1, 295-311.
Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: dual-route and parallel-distributed-processing approaches. Psychological Review, 100, 589-608.
Fukushima, K. (1975). Cognitron: A self-organising multilayered neural network. Biological Cybernetics, 20, 121-136.
Green, C.D. (1998) Are Connectionist Models Theories of Cognition? PSYCOLOQUY 9 (4) ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1998.volume.9/ psyc.98.9.04.connectionist-explanation.1.green
Gurd, J.M. (1996). Word search in patients with Parkinson's disease. Journal of Neurolinguistics, 9, 207-218.
Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences U.S.A., 81, 3088-3092.
Kohonen, T. (1977). Associative Memory: A System Theoretical Approach. New York: Springer.
McClelland, J.L., Rumelhart, D.E., & the PDP Research Group. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Human Cognition. Cambridge, MA: MIT Press.
Patel, M. (1996). Using neural nets to investigate lexical analysis. In N. Foo & R. Goebel (Eds.). Lecture Notes in Artificial Intelligence 1114. Berlin: Springer.
Sarle, W. (1994). Neural networks and statistical models. Proceedings of the Nineteenth Annual SAS Users Group International Conference, April 1994.
Watters, P.A. & Patel, M. (1998). Modelling semantic processing errors dopaminergic disorders using a competitive neural network. Technical Report C/TR-9801, Department of Computing, Macquarie University NSW 2109 AUSTRALIA.
Watters, P.A. & Tolhurst, D.J. (1998). Comparing variance distribution in orthogonal and sparse-coding models of simple cell receptive fields in mammalian visual cortex. Journal of Physiology, 506.P, 91P.