Deane (1994) points out that the template-matching approach of DISCERN (Miikkulainen, 1993; 1994a) cannot scale up to realistic linguistic input without fundamental modifications such as a stack or a short-term memory. I agree with this observation and will briefly discuss a more sophisticated parsing model that employs some of the methods Deane suggests.
1. Neural networks are pattern transformers, mapping a vector of input values to a vector of output values. They are good at detecting and making use of surface-level regularities such as correlations between input and output features, but only if the features always appear at the same fixed locations in the input and output. Therefore, a good approach for knowledge representation in neural networks is to lay out the data in fixed templates. This was done, for example, for the sentence and story representations in DISCERN. However, as Deane points out, much of linguistic structure, such as recursive relative clauses, cannot be represented in fixed template form. Processing recursive structures turns out to be one of the most difficult challenges for the connectionist approach.
2. In the short term, a good approach might be to supplement a connectionist low-level pattern mapping network with an external symbolic system that keeps track of the recursive structure of the sentence. Several implementations of this idea exist with good results (Simmons and Yu, 1990; Das et al., 1992). However, I agree with Deane that if the goal is to test whether natural language processing can emerge from subsymbolic computations, we need to be able to build such a parsing system solely on distributed neural networks.
3. One idea that Deane suggests is to follow the modular approach of DISCERN and include a neural network that would act as a stack, or a short-term memory, in the parser architecture. This is precisely the approach taken in our recent work on the SPEC model (Miikkulainen and Bijwaard, 1994). SPEC is a modular system that parses a sentence with multiple embedded relative clauses into a collection of case-role representations. Parsing is divided into three subtasks, implemented in separate modules: (1) The Parser forms the case-role representations for each clause; (2) The Stack stores the clauses interrupted by embeddings in the short-term memory; (3) The Segmenter controls the parsing process by segmenting the sentence into clauses and embeddings.
4. The Parser is a version of the simple recurrent network (SRN) (Elman, 1990), and similar to the DISCERN Sentence Parser. It reads words of the sentence one at a time and gradually builds the pattern representing the case-role assignment of the current clause at its output. These representations are read off the system as soon as they are complete.
5. As usual in the SRN network, the previous hidden layer forms a reduced description of the input sequence so far. The Stack (implemented as a linear RAAM network, Pollack, 1990) has the task of storing this reduced description at each center embedding, and restoring it upon return from the embedding. For example, in parsing "The boy who chased the cat saw the girl", the hidden layer representation is pushed onto the stack after "The boy", and popped back to the previous hidden layer after "who chased the cat", allowing the parser to process the rest of the sentence as if the embedded clause was never there. The parser outputs the case-role representation |agent=boy act=chased patient=cat| after the relative clause, and |agent=boy act=saw patient=girl| after the entire sentence.
6. Each module needs to be trained with only the basic constructs that occur in its task (e.g., the Parser with the different clauses, and the Stack and the Segmenter with the different types of embedding transitions). By its very architecture, the complete SPEC system then generalizes to novel combinations of these constructs, including new relative clause structures. This is a fundamentally different type of generalization than is usually found in neural networks. SPEC is not interpolating between input patterns; it is processing truly novel input structures. In this sense the system is capable of systematic and productive processing of language.
7. Even with such strongly symbolic capabilities, SPEC is still a fully subsymbolic system, and several interesting cognitive phenomena emerge automatically from the parallel distributed nature of processing: (1) Deeper embeddings are harder to process than shallower ones; (2) Sentences that have strong semantic constraints are easier than sentences where the syntax must be used alone; (3) And a typical mistake is to confuse who did what to whom in the sentence, rather than generating random role bindings.
8. Although SPEC can parse sentences with rich linguistic structure, it does not have a mechanism for representing such complex relationships in its output. The assumption is that the final meaning can be represented as an unstructured collection of case-role representations; the original sentence structure does not matter once the case-role semantics has been extracted. The problem of binding together identical fillers (e.g., "boy" in the above example) and representing the collection in a neural network must still be solved, but these should be easier problems than representing the entire relative clause structure. Methods such as those described by Edelman in his review (Edelman, 1994; Miikkulainen, 1994b) could turn out appropriate in this task.
Das, S., Giles, C.L. and Sun, G.Z. (1992) Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory. In Proceedings of the 14th Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Erlbaum.
Deane, P. (1994) Narrowing the Gap: Miikkulainen and the Connectionist Modeling of Linguistic Competence. PSYCOLOQUY 5(77) language-network.4.deane.
Edelman, S. (1994) Biological Constraints and the Representation of Structure in Vision and Language. PSYCOLOQUY 5(57) language-network.3.edelman.
Elman, J.L. (1990) Finding structure in time. Cognitive Science, 14:179-211.
Miikkulainen, R. (1993) Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory. Cambridge MA: MIT.
Miikkulainen, R. (1994a) Precis of: Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon and Memory. PSYCOLOQUY 5(46) language-network.1.miikkulainen.
Miikkulainen, R. (1994b) Representation of Structure on Linguistic Maps. PSYCOLOQUY 5(86) language-network.8.miikkulainen.
Miikkulainen, R. and Bijwaard, D. (1994) Parsing Embedded Clauses with Distributed Neural Networks. In Proceedings of the 12th National Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann. (an expanded version is available by anonymous ftp from cs.utexas.edu: pub/neural-nets/papers/miikkulainen.subsymbolic-caseroles.ps.Z)
Pollack, J.B. (1990) Recursive Distributed Representations. Artificial Intelligence 46:77-105.
Simmons, R.F. and Yu, Y.H. (1990) Training a Neural Network to Be a Context-Sensitive Grammar. Proceedings of the Fifth Rocky Mountain Conference on Artificial Intelligence, Las Cruces, NM, 251-256.