Csaba Pleh and Zsuzsa Kaldy (1996) Where Does Story Grammar Come From?. Psycoloquy: 7(34) Language Network (14)

Volume: 7 (next, prev) Issue: 34 (next, prev) Article: 14 (next prev first) Alternate versions: ASCII Summary
PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).
Psycoloquy 7(34): Where Does Story Grammar Come From?

Book Review of Miikkulainen on Language-Network

Csaba Pleh and Zsuzsa Kaldy
Department of General Psychology
Eoetvoes Lorand University
P.O. Box 4 H-1378
Budapest, Hungary

pleh@izabell.elte.hu Kaldy@izabell.elte.hu


This commentary discusses some problems that we have encountered with Miikkulainen's language processing model, DISCERN, described in his book, Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, Memory (1993). First, DISCERN uses grammatical analysis in an unclarified manner. It assumes certain notions like Cases without giving them a connectionist account, and also presupposes a preliminary syntactic analysis. Neither the categories nor the temporal relations are connected to research on human parsing. Second, the relations between sentence parsing and story parsing are left unclear. Third, the story material used is extremely constrained. These constraints (one actor, one plot, etc.) are not trivial and seriously limit the scope of the model for human story processing.


computational modeling, connectionism, distributed neural networks, episodic memory, lexicon, natural language processing, scripts.
1. Our criticism does not take issue with the technical machinery of the model proposed by Miikkulainen: we only consider the proposed power and superiority of the model and its linguistic background.

2. Some troubles with parsing and passing over information. One of the key limitations and hidden tricks of the approach taken by Miikkulainen is that at some points his model is merely pseudo automatic. Though it claims to be modular -- it is even asserted that modularity is its key advantage over previous connectionist models -- the modules sometimes contain unexplained information. Information that is not obtained for free as a side result of computation, is instead taken from some unacknowledged sources.

3. This is most visible and problematic in connection with the sentence parser. The module takes some form of syntactic analysis and the word string as input and gives case role representation as output. Somehow, one has to figure out that the model has a preliminary first pass which is not even considered by the model. This preliminary processing is done by some agent that is outside relative to the model. The remaining task for the machine is simply to combine this preliminary syntactic analysis with the semantic features to get to the Case role representation.

4. "In the basic version of the case-role assignment task, the syntactic structure of the sentence is given and consists of, for example, the subject, verb, object and a with-clause. The task is to decide which constituents play the roles of agent, patient, an instrument, and patient modifier in the act ... The task requires taking into account all the positional, contextual, and semantic constraints simultaneously ..." (p. 51).

5. Where does this case list come from? What kind of grammatical model is implied in the lexical characterizations that seem to give the basis for the preliminary syntactic analysis? These are issues someone familiar with present-day psycholinguistics would like to raise. The model for Case Role assignment ostensibly differs from the McClelland and Kawamoto model (1986), and we should add, from the McClelland and Taraban (1989) model too in one basic respect: in the model proposed by Miikkulainen there is no preassigned list of semantic features that would be associated with certain Case roles. There is no list of microfeatures (Clark, 1989; 1993) like "hardness" or "sharpness" connected to an agent. Rather Miikkulainen's model produces nice looking cluster trees for the nouns the model already encountered and trivial expectations based on contingency, for example, from eat to pasta. This seems to be fair enough to account for statistical contingencies that are certainly important in natural language processing but does not seem to be able to provide an account for the genesis of the grammatical categories, that is, the Case roles, themselves.

6. Some of the problems with his model seem to be related to the fact that Miikkulainen does not worry enough about the research on human parsing that is concerned with ordering the use of different information sources. In most of the present Government and Binding related work on human sentence processing, several levels of sentence structure are differentiated. The computation of syntactic structure roughly corresponds to what Miikkulainen calls preliminary analysis, and Thematic processing to what he calls Case Role assignment. For the human psycholinguistics teams, the real issue is the relative weight and the temporal relationship between the two types of structural assignments. Regarding the relationship between formal, attachment based, and merely contingent, connectionist associative principles Frazier and Clifton (1996) have recently proposed a rather intricate model. The temporal issue has a rather long tradition in psycholinguistics. Some authors claim in this regard (e.g., Forster, 1979) that Thematic role assignments are secondary while others suggest that they are basically parallel to syntactic parsing and support it (Altmann and Steedman, 1988; Tannenhaus, Carlson, and Trueswell, 1989). We do not want to suggest that a strict connectionist model should somehow accept these proposals. However, it should certainly clarify the relationship between its own version of modularity and the grammar based modular views.

7. Whatever levels we postulate and whatever temporal relationship we suggest for their use in processing, one has to somehow answer the question: where do the categories of the output of understanding come from? It is certainly misleading to treat, say the category of Subject, or Agent for that matter, as if it were trivial. Where does it come from? Different theories of understanding, when they emphasize different cues to understanding (formal, semantic, contextual), at the same time imply a stance regarding the origin of these categories. If I say that in a given language word order or Animacy is primarily used in decisions regarding the Subject, that suggests a model concerning the origin of this category. By leaving preliminary analysis in the shadow, Miikkulainen leaves us in the dark about these issues of origin.

8. Regarding the higher level modules, DISCERN, a purely PDP-based system, lacks two types of knowledge. First it is unable to handle two or more scripts at a time but Miikkulainen claims to be able to solve this problem in the near future. This has, of course, been a basic problem with regular symbol processing approaches to scripts. Schank (1982) was always keen to point this out as a key issue in script-based processing.

9. The second problem is that DISCERN is unable to deal with exceptions and deviations from the original stories. This is a rather crucial problem since one has to remember that the original motivation for PDP approaches was that, unlike classical symbol processing, they were able to deal easily with exceptions (see Rumelhart and McClelland, 1986; Clark, 1989). Miikkulainen finds this problem to be more difficult since it would require higher level monitoring and hierarchical control. Sequential reasoning and processing of new structures is in need of a metalevel knowledge, that is, it needs rules and meta-schemata. DISCERN, however, is unable to deal with not only the relatively complex problems but also with role-binding by two fillers. DISCERN, after processing two similar shopping stories -- "Mary bought a TV at Circuitry" and "John bought a TV at Radio Shack" -- would answer John to the question "Who bought a TV set?" because the memory traces for John are more recent (215). Psychologically, this is not plausible by a long shot. Even if we do not require a double answer, some kind of hesitation would be more convincing in the model.

10. There are several problems with the story material. If story understanding is merely pattern recognition and question answering pattern completion, then DISCERN is an appropriate model. However, the area covered seems to be too bounded. The stories have only one single actor. Questions only refer to one person, and the participants do not engage in variable actions: they show a statistically stereotyped pattern. Even on the imaginary or mental level (i.e., in considering different possible paths), the stories have only a single plot. There are no diverging paths and therefore there are no decisions being made.

11. This also has some consequences for the language processing aspects of DISCERN. According to Miikkulainen, pronoun antecedents are easy to handle in script-like stories (p.304). This is, however, only true for single-person stories. If another active agent shows up, then the issue of the continuity of reference becomes a very hard problem to handle for this model. This is related to the general issue that in the modular view of Miikkulainen, the relationship between the story processing and the Case processing modules are not at all clear. We are not given clear paths to get from say, sentence Agents to the story agents. The old tenet of Hofstadter seems to still hold for this model as well: "It is probably safe to say that writing a program which can fully handle the top five words of English 'the', 'of', 'and', 'a' and 'to' would be equivalent to solving the entire problem of AI, and hence tantamount to knowing what intelligence and consciousness are." (Hofstadter, 1979, pp. 629-630).


Altmann, G.T.M. and Steedman, M. (1988). Interaction with context during human sentence processing. Cognition, 30, 191-238.

Clark, A. (1989). Microcognition. Cambridge, MA: MIT Press.

Clark, A. (1993). Associative Engines. Cambridge, MA: MIT Press.

Carlson, G.N. and Tannenhaus, M.K. (eds.) (1989). Linguistic structure in language processing. Dordrecht: Kluwer.

Carlson and Tannenhaus (1988). Thematic roles and language comprehension. Syntax and Semantics. Vol. 21, 263-288.

Forster, K. (1979). Levels of processing and the structure of the language processor. In: Cooper, W.E. and Walker, E.C.T. (Eds.): Sentence Processing: Psycholinguistic Studies Presented to Merrill Garrett. Hillsdale, NJ: Lawrence Erlbaum.

Frazier, L. and Clifton, C., Jr. (1996). Construal. Cambridge, MA: MIT Press.

Frazier, L. and Fodor, J.D. (1978). The sausage machine: A new two stage parsing model. Cognition, 6, 291-325.

Hofstadter, D.R. (1979). Goedel, Escher, Bach. New York: Basic Books.

McClelland, J.L. and Kawamoto, A.H. (1986). Mechanisms of sentence processing: Assigning roles to constituents. In: McClelland, J.L. and Rumelhart, D. E. (Eds.): Parallel Distributed Processing. Vol. 2. Cambridge, MA: MIT Press, 272-325.

McClelland, J.L. and Taraban, R. (1989). Sentence comprehension: A Parallel Distributed processing approach. Language and Cognitive Processes, 4, 287-335.

Miikkulainen, R. (1993) Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, Memory. Cambridge, MA: MIT Press.

Miikkulainen, R. (1994) Precis of: Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, Memory. PSYCOLOQUY 5(46) language-network.1.miikkulainen.

Schank, R. C. (1982). Dynamic Memory. Cambridge: Cambridge University Press.

Tannenhaus, M.K., Carlson, G. and Trueswell, J.C. (1989). The role of thematic structures in interpretation and parsing. Language and Cognitive Processes, 4, 211-234.

Taraban, R. and McClelland, J.L. (1988). Constituent structure and thematic role assignment in sentence processing: Influences of content-based expectations. Journal of Memory and Language, 27, 597-632.

Volume: 7 (next, prev) Issue: 34 (next, prev) Article: 14 (next prev first) Alternate versions: ASCII Summary