The relationship between connectionist models and cognitive theories has been a source of considerable debate within cognitive science. Green (1998) has recently joined this debate, arguing that connectionist models should only be interpreted as literal models of brain activity; in other words, connectionist models only contribute to cognitive theories at the implementational level. Recent results, however, have shown that interpreting the internal structure of connectionist models can produce novel cognitive theories that are more than mere implementations of classical theories (e.g., Dawson, Medler, & Berkeley, 1997). Furthermore, such connectionist theories have an advantage over more classical approaches to cognitive theories in that they posit explanatory -- as opposed to merely descriptive -- theories of cognition.
2. Green (1998) has now entered the fray, suggesting that "the only way of interpreting connectionist networks as serious candidates for theories of cognition would be as literal models of the brain activity that underpins cognition" (para. 20). Although he is careful not to argue that cognition is inherently symbolic, Green is nevertheless adopting the mere-implementation argument of Fodor and Pylyshyn (1988). The main reason Green endorses this position is his belief that the internal structure of connectionist networks is inherently uninterpretable; that is, when connectionists are asked how their networks accomplish a particular task, they are limited to saying that their networks solve the problem by some (mysterious) distributed process. Consequently, Green concludes that the networks are a mere implementation of some cognitive theory.
3. Green does concede that if one could interpret the internal activity of the connectionist simulation, the simulation would have the potential to become a scientific theory worthy of consideration. In other words, interpretation of the internal structure of a network would allow for more than a mere implementational account of cognitive theories.
4. Green's concession is crucial. This is because Green has unfortunately ignored the growing literature on network interpretation that has the potential to make his implementational claim obsolete by his own concession. Connectionists have long recognized the importance of analyzing the internal structure of their networks. This view is typified by Hanson and Burr (1990):
Post hoc analyses of the way a network computes and represents information over subsets of hidden units ... can clarify imprecise or incomplete models of psychological phenomena and can help reveal important relations between learning representation that are not taken into account in the rule- based approach. (p. 476)
Consequently, several different techniques for interpreting the internal structure of connectionist networks have been developed. In computer science, other researchers have developed a variety of techniques for extracting conditional rules from networks, or for translating networks into more traditional information processing architectures (e.g., Alexander & Mozer, 1995; Craven & Shavlik, 1994; Gallant, 1993; Giles et al., 1992; Omlin & Giles, 1996; Saito & Nakano, 1988; Thrun, 1995). In cognitive science, these analyses can be roughly divided into two categories: (i) analysis of network weights (e.g., Hinton, McClelland, & Rumelhart, 1986; Hanson & Burr, 1990), and (ii) analysis of hidden unit activities (e.g., Elman, 1990; Moorhead, Haig & Clement, 1989).
5. For one example of this last approach, Berkeley, Dawson, Medler, Schopflocher, and Hornsby (1995) have introduced a network interpretation technique, dubbed "banding analysis", that has been applied to different problems (Berkeley et al., 1995; Dawson & Medler, 1996) and network architectures (e.g., McCaughan, 1997). Basically, banding analysis is equivalent to "wiretapping" each hidden unit while the stimulus set is being presented. The recorded activations are then used to create a jittered density plot for each hidden unit. Often, these density plots form distinct "bands" and these bands can be interpreted using simple statistics and correlations. Consequently, this technique not only describes what each hidden unit is sensitive to, but also how the network uses the distributed representations to solve the problem being modeled.
6. In fact, Dawson, Medler, and Berkeley (1997) have argued that interpreting the internal structure of connectionist networks is crucial if one wants to make any claims about the theoretical importance of connectionism, especially with regard to cognitive theories. For example, Bechtel and Abrahamsen (1991) trained a network to solve a series of logical problems; after successful training, Bechtel and Abrahamsen claimed that the network had produced a radically different type of cognitive theory, but failed to describe it. In contrast, Berkeley et al. (1995) trained a network to solve the same problem, but then performed the essential step of analyzing the network's structure using banding analysis. This showed that the network was using "classical" rule-like structures to solve the problem. Although the network discovered five of the necessary classical rules, it discovered two other rules that were definitely not part of traditional logic. Dawson, Medler and Berkeley argued, on the basis of these nontraditional rules that the network had produced a NOVEL cognitive theory. Because the theory was cognitive, it could not be dismissed as being "merely implementational" or nonpsychological.
7. If it can be shown that connectionist models can produce novel cognitive theories, then what distinguishes them from classical approaches to cognitive theorizing? In other words, is there an advantage to adopting a connectionist perspective? The answer to this lies in the two different approaches to producing cognitive theories.
8. One approach to cognitive theorizing is based on examining a large range of data and attempting to produce generalizations about the patterns of data; this approach produces theories with "descriptive adequacy." Although descriptive theories can describe the phenomena that do occur and can generate novel predictions, Seidenberg (1993) argues that they cannot explain why other and equally plausible phenomena do not occur. Furthermore, he continues, it is not enough for us simply to describe the kinds of things that are in the world; we also need to understand why things are the way they are and not any other way. This type of cognitive theorizing is equivalent to the analytical approach to cognition (see also O'Brien, 1998).
9. Another -- more fruitful -- approach to cognitive theorizing is to show how the phenomena in question derive from deeper principles; such an approach produces explanatory theories. Whereas theories with descriptive adequacy are based on task- or phenomenon-specific principles, a condition on explanatory theories is that they appeal to a small set of concepts that are independently motivated. Furthermore, if these underlying principles are explanatory, then they will also contribute to the understanding of phenomena in different domains. Explanatory theorizing is equivalent to the synthetic approach to cognition (e.g., Braitenberg, 1984). As a result, it allows us to understand why things are a certain way and not another. In other words, whereas descriptive theories only describe possible computational functions, explanatory theories define the computational competence of an information processor. Seidenberg (1993) states that connectionism is just the tool to contribute to the development of theories that are explanatory and not merely descriptive.
10. Seidenberg's view is not without controversy. For example, Massaro (1988, 1990) contends that certain assumptions in connectionist models are unnecessary and inconsistent and that the models themselves are too powerful to be of any theoretical importance to cognitive science. Seidenberg argues, however, that connectionist models acquire their explanatory power when constraints are applied in systematic ways. One important constraint is their appeal to a small yet general set of computational principles. The second constraint lies in the further requirement that they be neurobiologically relevant. When these constraints are met, Seidenberg argues that connectionism contributes to the development of explanatory theories by providing a candidate set of independently motivated theoretical principles (cf., McCloskey, 1991).
11. Thus, connectionism can be viewed as a way of not only modeling cognitive functions, but also offering new explanatory cognitive theories. Although connectionism can and often produces computational theories that are appropriate and adequate for particular tasks in specific domains, their advantage over classical theorizing is that connectionism's underlying principles remain the same regardless of domain.
Alexander, J., & Mozer, M. (1995). Template-based algorithms for connectionist rule extraction. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances In Neural Information Processing Systems (Vol. 7, pp. 609-616). Cambridge, MA: MIT Press.
Bechtel, W., & Abrahamsen, A. (1991). Connectionism and the mind: An introduction to parallel processing in networks. Cambridge, MA: Blackwell.
Berkeley, I.S.N., Dawson, M.R.W., Medler, D.A., Schopflocher, D.P., & Hornsby, L. (1995). Density plots of hidden unit activations reveal interpretable bands. Connection Science, 7, 167-186.
Braitenberg, V. (1984). Vehicles. Cambridge, MA: MIT Press.
Broadbent, D. (1985). A Question of Levels: Comment on McClelland and Rumelhart. Journal of Experimental Psychology: General, 114, 198-192.
Craven, M. W., & Shavlik, J. W. (1994). Using sampling and queries to extract rules from trained neural networks. Machine Learning: Proceedings of the Eleventh International Conference.
Dawson, M.R.W. (1998). Understanding Cognitive Science. Oxford: Blackwell.
Dawson, M.R.W., & Medler, D.A. (1996). Of mushrooms and machine learning: Identifying algorithms in a PDP network. Canadian Artificial Intelligence, 38, 14-17.
Dawson, M.R.W., Medler, D.A., & Berkeley, I.S.N. (1997). PDP networks can provide symbolic models that are nor mere implementations of classical theories. Philosophical Psychology, 10, 25-40.
Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 179-211.
Fodor, J. A. & Pylyshyn, Z. W. (1988) Connectionism and cognitive architecture: A critical analysis. Cognition, 28, 3-71.
Gallant, S. I. (1993). Neural network learning and expert systems. Cambridge, MA: MIT Press.
Giles, C. L., Miller, C. B., Chen, D., Chen, H. H., Sun, G. Z., & Lee, Y. C. (1992). Learning and extracting finite state automata with second-order recurrent neural network. Neural computation, 4, 393-405.
Green, CD. (1998) Are Connectionist Models Theories of Cognition? PSYCOLOQUY 9(4) ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1998.volume.9/ psyc.98.9.04.connectionist-explanation.1.green
Hanson, S.J., & Burr, D.J. (1990). What connectionist models learn: Learning and representation in connectionist networks. Behavioral and Brain Sciences, 13, 511-518.
Hinton, G. E., McClelland, J. L., and Rumelhart, D. E. (1986) Distributed representations. In Rumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, MIT Press, Cambridge, MA.
Horgan, T., & Tienson, J. (1996). Connectionism and the philosophy of psychology. Cambridge, MA: MIT Press.
Jackendoff, R. (1992). Languages of the mind: Essays on mental representations. Cambridge, MA: MIT Press.
Massaro, D.W. (1988) Some criticisms of connectionist models of human performance. Journal of Memory and Language, 27, 213-234.
Massaro, D. (1990). The psychology of connectionism. Behavioral and Brain Sciences, 13, 403-406.
McCaughan, D.B. (1997). On the properties of periodic perceptrons. Proceedings of the 1997 International Conference on Neural Networks, 188-193.
McCloskey, M. (1991). Networks and theories: The place of connectionism in cognitive science. Psychological Science, 2, 387-395.
Moorhead, I. R., Haig, N. D., & Clement, R. A. (1989). An investigation of trained neural networks from a neurophysiological perspective. Perception, 18, 793-803.
O'Brien, G. J. (1998) The role of implementation in connectionist explanation. PSYCOLOQUY 9(6). ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1998.volume.9/ psyc.98.9.06.connectionist-explanation.3.obrien.
Omlin, C. W., & Giles, C. L. (1996). Extraction of rules from discrete-time recurrent neural networks. Neural networks, 9, 41-52.
Saito, K., & Nakano, R. (1988). Medical diagnostic expert system based on PDP model. Proceedings of the 1988 IEEE International Conference on Neural Networks, ICNN'88, 255-262.
Schneider, W. (1987). Connectionism: Is it a paradigm shift for psychology? Behavior Research Methods, Instruments, and Computers, 19, 73-83.
Seidenberg, M. (1993). Connectionist models and cognitive science. Psychological Science, 4, 228-235.
Thrun, S. (1995). Extracting rules from artificial neural networks with distributed representations. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in Neural Information Processing Systems (Vol. 7, ). Cambridge, MA: MIT Press.