Despite its early promise, connectionism has had a minimal impact upon linguistic theory -- largely because it has proved very difficult to scale neural network models up to the size necessary to handle realistic linguistic data. Miikkulainen's (1993) work on script comprehension demonstrates that such scaling-up is possible. Miikkulainen's technique -- the construction of a complex system incorporating multiple neural nets -- is biologically plausible and replicates important psycholinguistic findings about language processing. However, Miikkulainen's system lacks the properties which most linguists would consider diagnostic of human language. Its simple template-matching system lacks grammatical structure (morphology and syntax), and supports neither generativity of form nor compositionality of semantics. These difficulties can probably be overcome in future work, but will require connectionists to address the mathematical and theoretical issues inherent in multiple-network models of the sort Miikkulainen employs.
1. Risto Miikkulainen's Subsymbolic Natural Language Processing (1993, 1994) represents a promising approach to connectionist modeling of language. Classic PDP models used single neural networks to model isolated phenomena and relevant microfeatures were often hand-crafted to yield the correct results. The resulting analyses could often be attacked as ad-hoc and simplistic; they were usually too limited in scope to provide useful models of complex linguistic phenomena. Miikkulainen's work seeks to overcome the limitations of classic PDP by employing a much more sophisticated model in which multiple neural networks function -- and are trained -- in tandem. Miikkulainen's work draws much of its inspiration from work in cognitive neuroscience, and aims to present a model which is (at some level of abstraction) biologically realistic. A number of recent works have taken a similar approach, including Terry Regier's UC-Berkeley dissertation on the semantics of prepositions (1992).
2. The essence of Miikkulainen's approach lies in his use of multiple neural networks to form an integrated system. Several backpropagation networks are used to model the input/output patterns involved in parsing and generating sentences (and larger discourse units). Miikkulainen's system includes a sentence parser, a story parser (for processing sequences of sentences), a cue former (to interpret questions), an answer producer, a sentence generator, and a story generator. These six backpropagation networks are designed to allow the system to function as a simple, but complete natural language processor, able both to process natural language texts and to produce appropriate natural language output. The backpropagation networks are supported by a hierarchically structured system of feature maps which function as long-term memory and require the backpropagation networks to use common underlying representations for word meanings and story patterns. While backpropagation networks and feature maps are basic connectionist tools, Miikkulainen modifies them to implement capabilities they do not normally support, such as his use of hierarchical trace feature maps as models of episodic memory.
3. The major limitation of Miikkulainen's system is its linguistic simplicity (dictated, perhaps, by its focus on providing a model for the psycholinguistic properties of scripts.) Sentence processing is a simple template-matching process in which word forms are mapped onto their semantic representations and word sequences are mapped onto semantic case frames. Story parsing, similarly, is a straight template match in which sequences of sentences are mapped onto sequences of events in an underlying script. Miikkulainen's system thus incorporates basic semantic and discourse concepts (case frame, script) but deploys both within the connectionist equivalent of a simple finite-state grammar.
4. Miikkulainen's system, DISCERN, embodies an entire series of advances towards a state-of-the-art system. Many or these advances are original to DISCERN, while others represent recent advances not yet universally familiar to the larger cognitive science community. The following techniques are of particular interest:
a. The theory is embodied as an interacting system of subnetworks, rather than trying to accomplish the entire task within a single undifferentiated network.
b. A common lexicon allows different neural networks to communicate the classificatory schemes developed in their hidden layers. This technique seems to free Miikkulainen from the bane of early connectionist work on language: the need to hand-code microfeatures that predefine the problem space.
c. Multiple TYPES of neural network are used within the same system. Miikkulainen's recurrent backpropagation networks (FGREP's) are used for pattern transformation tasks, such as matching word strings to case role assignments. But a second type of net (feature maps) is used to provide an explicit episodic/ declarative memory. Since feature maps locate similar concepts in adjacent physical locations, they provide a natural mechanism for mapping implicit, distributed information into classificatory groups.
d. A physical hierarchy of feature maps forces the construction of a hierarchical script representation. Information is fed into the highest-level map, which extracts the most important distinctions and passes the remaining information to a series of lower-level maps, each of which subclassifies instances of one of the categories established at the next level up.
e. Lateral connections among nodes in a feature map create memory traces. This enables hierarchical feature maps to serve as an episodic memory capable of storing individual stories instantiating more general script patterns.
f. Arbitrary ID patterns individuate reference to individuals. This enables the episodic memory to represent and store role bindings. This feature supports one of the most important features of scripts in discourse theory: inference of events left implicit in text. The ID+Content technique is also used to `clone' words with similar meaning, allowing a rapid expansion of vocabulary.
5. These techniques allow DISCERN to function as a complete (though very limited) text processing system. It is able to process a simple sentence (a list of words ending with a period), matching it to a semantic case frame in which word concepts have been inserted. It is able to process a story (a sequence of simple sentences), matching the story to one of the script patterns it has learned, and drawing appropriate inferences. It is able to process questions about stories and provide appropriate answers. It can generate stories conforming to one of the script patterns it has learned. And most importantly, it performs all of these activities in an entirely connectionist, nonsymbolic fashion.
6. DISCERN is designed to be a plausible (though simplified and highly schematic) model of the neurolinguistic and psycholinguistic processes which underlie discourse processing. Physically, it is built around backpropagation networks and feature maps: both of which are inspired by (if not directly intended to model) well-established neurological models. In its functioning, it replicates a number of the most important psycholinguistic features of language comprehension. The following properties are worth noting:
a. DISCERN maintains all of the characteristic strengths of a connectionist system, that is, representations are self-organizing and generalize spontaneously; and performance degrades gracefully in the face of information overload, incomplete information, noise or damage. It thus demonstrates the capacity of a multiple network system to maintain the advantages of connectionist systems while simulating higher level processing.
b. The DISCERN word lexicon shows properties typical of human lexical access processes: all possible meanings are activated initially, followed by selection of the contextually appropriate meaning; and access errors tend to involve semantic or formal (in this case, visual) similarity.
c. DISCERN also simulates important properties of human story processing: it generates plausible guesses in the face of incomplete information; it is able to recognize the script to which a story belongs even if information is out of order; and it automatically corrects minor errors, substituting plausible information based upon the script it uses to interpret a story. Additionally, errors in comprehension typically involve confusion of similar instances, or a conflation of details from similar stories. And production errors typically involve substitutions among closely related items, such as FRIES for HAMBURGERS in a fast-food story.
The system is, however, quite rigid in some of its behavior; for example, it is unable to correct a guess that it has made based upon information provided in a question.
7. The DISCERN system is most fully developed as a model of simple script comprehension. It is important, however, to appreciate its limitations as a linguistic model. DISCERN models linguistic structure entirely in terms of strings: strings of words, strings of sentences, strings of case frames. Similarly, the system for representing scripts has no provision for the embedding of scripts within scripts: by definition, the system is designed to map a finite set of sentence frames onto a finite set of case frames, and these onto a finite set of scripts. Since recursive embedded structure is one of the most important design features of human language, such a framework is only suited to function as a demonstration model. It will not scale up to real discourse without fundamental modifications.
8. While Miikkulainen is well aware of this limitation, and discusses some potential modifications, none seem to overcome the basic difficulty. For example, his model of relative clause parsing bears a far stronger resemblance to the strategies employed by agrammatic aphasics than it does to the parsing strategies of normals. This failure appears to illustrate what is, to date, a critical limitation of connectionist models: their failure to handle complex embedded structures efficiently. This is, of course, the fundamental strength of symbolic systems, which handle multiple embedded expressions as a matter of course. There are, however, obvious reasons for the failure: DISCERN lacks anything which corresponds computationally to a stack, or psychologically to short term memory. Judging by what Miikkulainen accomplished in DISCERN, and the literature review he provides, this remains connectionism's most fundamental limitation.
9. As Miikkulainen points out, there are hybrid systems which combine symbolic and connectionist architectures to overcome this type of difficulty. But if we wish to evaluate what a pure neural network model can accomplish on its own, DISCERN appears to define the state of the art. By this measure, connectionist NLP has come a long way -- and has a long way yet to go.
10. It is quite possible, however, that the remaining hurdles can be overcome. The use of multiple interacting networks appears to be extremely powerful, and it is quite possible that an appropriate multi-network design would be able to process embedded structures efficiently. But I suspect that progress will depend upon clarifying the mathematical nature of multiple network systems. Multi-net systems bear an interesting resemblance to the distributed structure of a relational database. If this is a valid comparison, multiple network connectionist models will have the necessary power, since relational databases (and the relational algebra which provides their theoretical underpinning) are entirely capable of modeling complex symbolic structures. Be this as it may, advances in connectionist NLP are likely to depend upon a deeper understanding of the mathematical properties of multi-network systems.
Miikkulainen, R. (1993) Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon and Memory. Cambridge, MA: MIT Press.
Miikkulainen, R. (1994) Precis of: Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon and Memory. PSYCOLOQUY 5(46) language-network.1.miikkulainen.
Reigier, T.P. (1992) The Acquisition of Lexical Semantics for Spatial Terms: A Connectionist Model of Perceptual Categorization. Ph.D. diss., University of California at Berkekley.