This target article has four sections. Section I sets out four principles which should guide any attempt to reconstruct the evolution of an existing biological characteristic. Section II sets out thirteen principles specific to a reconstruction of the evolution of language. Section III sets out eleven pieces of evidence for the view that vocal language must have been preceded by an earlier language of gesture. Based on those principles and evidence, Section IV sets out seven proposed stages in the process whereby language evolved: (1) the use of mimed movement to indicate an action to be performed, (2) the development of referential pointing which, when combined with mimed movement, leads to a language of gesture, (3) the development of vocalisation, initially as a way of imitating the calls of animals, (4) counting on the fingers leading into (5) the development of symbolic as distinct from iconic representation, (6) the introduction of the practice of question and answer, and (7) the emergence of syntax as a way of disambiguating utterances that can otherwise be disambiguated only by gesture.
EDITOR'S NOTE: Ullin T. Place died on January 2, 2000. His target article had been reviewed for PSYCOLOQUY and was essentially complete at the time of his death. Some minor editing has been done by PSYCOLOQUY Associate Editor A. Charles Catania, mainly to bring the manuscript into conformity with PSYCOLOQUY style. Catania will consider replying to commentaries on this article, but also welcomes the participation of others who may feel they are familiar enough with Place's perspectives to do so.
Charles Catania Department of Psychology University of Maryland, Baltimore County 1000 Hilltop Circle Baltimore, Maryland 21250 USA firstname.lastname@example.org
The target article below has today appeared in PSYCOLOQUY, a refereed online journal of Open Peer Commentary sponsored by the American Psychological Association.
OPEN PEER COMMENTARY on this target article is now invited. Qualified professional biobehavioural, neural or cognitive scientists should consult PSYCOLOQUY's Websites or send email (below) for Instructions if not familiar with format or acceptance criteria for commentaries (all submissions are refereed).
To submit articles or to seek information:
EMAIL: email@example.com URLs: http://www.princeton.edu/~harnad/psyc.html http://www.cogsci.soton.ac.uk/psyc
1. In this target article I advocate a version of what Hewes (1973a; 1973b; 1976) calls the gestural theory of the origin of language, which he traces back to Condillac (1746/1947), Tylor (1868; 1871), Morgan (1877), Wallace (1881; 1895), Romanes (1888), Wundt (1900) and Jhannesson (1949; 1950). This version differs from earlier versions of the theory, including that of Hewes himself, in that it incorporates some more recent work from two sources: behavioural psychology and neuroscience.
2. Until the development of writing, language leaves no direct trace in the archaeological record. Consequently the reconstruction of its evolution is necessarily speculative. There are, however, certain principles to which we can appeal in deciding which of two alternative reconstructions is the more probable. In developing the reconstruction presented here, I have been guided by a number of principles. Three of these are principles that apply to any such evolutionary reconstruction. The remainder are specific to the evolution of language. The general principles are:
3. The evolution of any complex biological characteristic proceeds in a sequence of small stages. Each stage comes about as a consequence of the selection from a population of chance mutations, one that gives the group in which it occurs a selective advantage over groups in which it has not occurred in the particular niche occupied by the species at that stage in the evolutionary process. This principle would rule out the sudden emergence of an innate language faculty in a single step as envisaged by Chomsky (1957, 1965).
4. Although phylogenetic development cannot be simply read off from an examination of the process of ontogenetic development, it is a reasonable assumption that the stages that are recognisable in the process of ontogenetic development correspond to stages that punctuated the phylogenetic development of the characteristic in question.
5. When the manifestation of an adaptation in its fully fledged form is blocked, organisms will revert to a form of adaptation that preceded it in the process of evolution.
6. Every mutation that is selected in the course of the evolution of a complex biological characteristic will leave its mark on the anatomical structure of members of the species in which it develops.
7. The principles that are specific to the evolution of language are:
8. Language is a form of behaviour or, to be more precise, a compound of two interacting behaviours, the speaker's utterance and the listener's response, where the listener's response may or may not consist of a reciprocal utterance. In explaining such behaviour for scientific purposes what the philosopher Donald Davidson (1970/1980) calls the vocabulary of propositional attitudes must be avoided. The reason for this is that statements of the form "X thinks, believes or knows that 'p'" contains an embedded declarative sentence, the sentence 'p', which is a quotation. It is a quotation in oratio obliqua, or indirect reported speech of what the individual in question would or would be expected to say, under conditions where the truth or falsity of p is at issue in deciding what to do or not to do in those circumstances. Since they presuppose an organism that is already linguistically competent, such explanations are clearly inadmissible in giving account of how language developed in the first place.
9. In seeking an alternative form of explanation for this kind of behaviour, I rely on a somewhat deviant version of Skinner's (1938; 1953) behaviour analysis in which behaviour in general and linguistic/ verbal behaviour in particular is construed as a part learned, part unlearned adaptation to what Skinner calls the three-term contingency consisting of (a) the Antecedent conditions, (b) the Behaviour called for from a particular organism under those conditions, and (c) the Consequences of so doing. Language behaviour is unique in that, when fully developed, it involves two different levels of adaptation to the three-term contingency.
10. At the pragmatic level each utterance and each response to an utterance is reinforced or disinforced (Harzem & Miles 1978), as the case may be by its immediate consequences, the response of the listener or previous speaker. At the semantic level every information-providing sentence is a representation of one or more contingency-terms. Since contingencies are defined by reference to the individual whose behaviour constitutes its middle term, it follows not only that each individual has set of contingencies which are unique to that individual, but that each situation specified by a sentence constitutes a different contingency term for each of the different parties to a linguistic interaction. This allows a linguistically competent organism to adapt to contingencies by means of what Skinner (1966/1969) calls contingency-specifying stimuli or rules without having to have had personal experience of the contingency in question either in the individual's past learning history or in that of the species.
11. Language is primarily a means of communication. Contrary to the opinion of Fodor (1975), it is only secondarily and derivatively a vehicle for thought. Like other interpersonal communication systems, it consists of responses emitted by one individual, the sign-producer or speaker, which act as signs, or discriminative stimuli as Skinner (1938) calls them, which have a consistent and predictable behaviour-orientating effect on another individual, the sign-receiver or listener, to whom the speaker's utterance is directed. What distinguishes language from other forms of interpersonal communication is that in order to exercise effective control over the behaviour of the listener and secure the reinforcement that only the listener can provide, the speaker must construct and utter a sentence. Sentences of the kind that convey information, as distinct from those whose function is to facilitate the process of communication, differ from other non- linguistic response-produced signs two respects. Whereas other response-produced signs are repeated over and over again, as Chomsky (1957, 1965) has always insisted, information-providing sentences are seldom repeated word for word, but are typically constructed anew on each occasion of utterance, albeit out of words, phrases and sentence frames that are repeated. The phenomenon of novel sentence construction arises from the fact that, whereas the units of which sentences are composed derive their behaviour-orientating function by generalisation from or repeated association with the natural signs of the presence or impending presence of the kind of action or object they stand for, sentences, provided they are constructed in accordance with the syntactic conventions accepted within the verbal community, have the ability to orient the behaviour of the listener towards the potential or actual presence beyond her current stimulus environment of a contingency the like of which she need never have experienced in her own case.
12. There can be little doubt that in the evolution of language, as in the linguistic development of the child, the earliest form of sentence to which listeners responded and speakers produced were of the type which Skinner (1957) calls a mand (a command, request or question). In emitting a mand the speaker specifies an action to be performed by the listener the subsequent and consequent emission of which by the listener reinforces the speaker's mand and secures reinforcement from the speaker in the form of either an expression of gratitude or the withdrawal of a threat. According to Skinner (1957, p.85), behaviour in the form of the mand operates primarily for the benefit of the speaker, whereas what he calls behaviour in the form of the tact, an information-providing sentence in other words, works for the benefit of the listener by extending his contact with the environment (unfortunately, as I point out in Place , Skinner uses the term 'tact' in three different senses of which tacts as information-providing sentences or statements are the least common). Tacts in this sense are typically emitted in response to an interrogative mand or question. They are reinforced, not, as in the case of the mand, by the behaviour they call for from the listener, but by a variety of specialised reinforcers, responses such as gratitude for information supplied, agreement with opinions given, sympathy for troubles told, surprise at and interest in news reported, or laughter at jokes. The tact is thus a more sophisticated form of utterance than the mand. In the evolution of language it must have developed later, as it does in the child. Moreover, since interrogative mands presuppose the availability of the tacts they solicit from the listener, it follows that the first sentences must all have been imperatives.
13. Sentences perform their function of acting as a discriminative stimulus for the message-receiver or listener by depicting what Barwise and Perry (1983) call a situation. A situation is either a state of affairs whereby a property of an entity or a relation between two or more entities remains constant over a period of time or an event whereby a property or relation changes at a moment of time (instantaneous event) or over a period of time (process). In the case of a mand, the situation depicted by the sentence is either an event, a change which the listener is being asked to bring about, or a state of affairs she is being asked to preserve. In the case of a tact the situation depicted is typically one whose occurrence or existence outside the listener's current stimulus environment is being reported or predicted. In either case the sentence consists of a multi-place predicate or verb phrase and as many arguments or noun phrases as are needed to depict the situation. The function of the predicate is to depict an action which, in the case of a mand, is to be performed by the listener or, in the case of a tact, either has been or is being performed by some other person or object(the agent). The arguments represent the various objects involved in the situation depicted of which the agent (the listener in the case of a mand) is one. The object to be or being manipulated, the destination to which it is to be or is being conveyed, and the consequence for the sake of which the action is to be or is being performed are others. Their nature and number is determined by the nature of the action represented by the predicate. Where a predicate represents a change in or persistence of a property, it is monadic. It has only one argument place to be filled, that of the property-bearer. Other predicates depict a change in or persistence of a relation and are polyadic. They need two or more argument places to be filled corresponding to each of the various objects between which the relation holds.
14. It is clear from work that has been done with a number of infra-human species that the ability both to respond to and generate sequences of stimuli which conform to the argument structure characteristic of a sentence is not something that is exclusive to the human species. Clear evidence of the ability to respond to simple imperative sentences consisting of a predicate, an action verb, and up to three arguments, the name of the agent/addressee, the object to be manipulated and its destination, has been provided in the case of suitably trained bottlenose dolphins (Turiops truncatus) by Herman and his colleagues (Herman 1987; Herman et al. 1984; Herman, Kuczaj & Holder 1993) and in the case of similarly trained sea lions (Zalophus californianus) by Schusterman and his colleagues (Schusterman & Krieger 1984; Schusterman & Gisiner 1988).
15. The evidence for the ability of infra-humans to produce such sentences is less clear cut. It consists, both in the case of Pepperberg's (1987) African Grey Parrot (Psittachus erithacus) Alex, and in the case of the bonobo chimpanzee Kanzi studied by Savage-Rumbaugh and her co-workers (Greenfield & Savage-Rumbaugh 1990), in combinations of no more than two elements, one a predicate and the other an argument. Evidence that these combinations are ones that the organism, the parrot or chimpanzee, has put together for himself rather than ones he has imitated from the utterances of his human caregivers is much stronger in the case of Kanzi than it is in the case of Alex. This is partly because Kanzi is using artificial lexigrams combined in some cases with gestures rather than the vocal imitations of English used by Alex. But it is also because Greenfield and Savage-Rumbaugh (1990) took great care to exclude cases where a particular combination could have been imitated from the behaviour of the caregivers, and because, in many cases the utterance combines the predicate either with another lexigram or with a gesture which is used to indicate the agent who is to perform the action specified by the predicate, a practice which is common in the utterances of young children whose sentences conform to what Bickerton (1990) calls protolanguage as in the child's sentence cited by Horne and Lowe (1996): 'Daddy push car'. That said, it is clear from the absence of reports of combinations with more than one argument that Kanzi's ability to construct novel sentences is very much more restricted than his ability to understand such sentences when formulated in either Yerkish or English and much more restricted than the structures understood by Herman's dolphins and Schusterman's sea lions.
16. This suggests that what a theory of evolution of human language needs to explain is not how our ancestors acquired the ability to decipher complex combinations of abstract symbols when organised in the form of the argument structure characteristic of the sentence. That is an ability that they and their mammalian predecessors had long possessed. What needs to be explained is how they acquired the ability, which their ape ancestors, it would seem, did not possess, to use abstract symbols to construct sentences of equal complexity to those they could understand and respond to, and how these two components, sentence comprehension and sentence construction, came to be welded together in such a way as to satisfy Liberman's (1993) requirement (quoted by Rizzolatti and Arbib, 1998) that in all communication the processes of production and perception must somehow be linked; their representation must, at some point, be the same.
17. Once we begin to view semantics as the handmaiden of pragmatics and syntactics as the handmaiden of semantics, it becomes increasingly difficult to endorse Chomsky's (1957, 1965) belief that, in order to explain the human ability to construct and construe complex sentences, we must postulate an innate language faculty which appeared deus ex machina in a single gigantic mutation at the dawn of human prehistory. Mutations there must have been. How else can we explain the fact that we talk and nonhumans, even with the best human instruction, barely do so? But what we must look for is not just one mutation, but a number of mutations spread out over millions of years, each one building on what has gone before, each one providing a selective advantage to the group in which it occurs which has enabled its members to survive and pass on their genes, when those who lacked that mutation went to the wall. Nor should we expect to find that the selective advantages which have promoted the survival of the groups in which such mutations have occurred have always been advantages conferred by improvements in interpersonal communication.
18. It seems likely that the earliest mutations whose establishment made the development of language possible were selected in the first instance, not so much for their utility in relation to the process of interpersonal communication as for their utility in relation to a Stone Age hunting-and-gathering technology. Two mutations which may well have been selected in this way are that which made referential pointing possible and that or those which produced the changes in the mouth and larynx which made possible the production of vocal speech.
19. An impressive body of evidence to be reviewed in Section III below makes it tolerably certain that the earliest form of human linguistic communication took the form of a language of gesture (Piaget 1926/1932) in which imperatives are constructed by combining a predicate in the form of a mimed action, something of which, as we know from Koehler's (1921/1927 pp.307-8) observations, chimpanzees are already capable, with one or more arguments (agent, manipulandum, destination) indicated by referential pointing at the objects in question. In addition to their ability to use miming to indicate the action to be performed, there is evidence (Savage-Rumbaugh 1986) that chimpanzees can learn to use pointing to indicate a particular food which they wish the human trainer to give them, and that the bonobo Kanzi (Greenfield & Savage-Rumbaugh 1990) has learnt to use referential gestures to indicate who is to do what, what object is to be manipulated or where it is to be placed, but in both cases only when the utterance is directed to a human caregiver. That, it would seem, is because, although they can learn to point, the evidence (Savage-Rumbaugh 1986, pp. 13, 180-2; Povinelli & Davis 1994) suggests that they cannot respond to such gestures when made by others. If so, referential pointing directed to a conspecific would be of no avail, a circumstance which would both confirm and explain Wundt's (1900) contention that referential pointing is something that apes in their natural state do not do.
20. Savage-Rumbaugh (1986, p.315) reports food-sharing sessions with Sherman and Austin, the two chimpanzees (Pan troglodytes) she had trained before she began working with Kanzi, the bonobo (Pan paniscus), in which when the table was between them, they could communicate which food to give simply by pointing to it, and at times they did, though pointing tended to be used to clarify symbolic [i.e., using lexigram] requests. The fact that this behaviour seems to have occurred only when the two chimpanzees were sitting opposite one another suggests to me that it was the direction of gaze rather than the pointing to which they were responding. As Louis Herman (1998) points out, this inability of chimpanzees to respond to referential pointing is in marked contrast to the bottlenosed dolphins he has trained who, as described in Herman, Pack and Morrel-Samuels (1993), readily learn to take an object to a place indicated by referential pointing on the part of the human trainer.
21. Herman suggests that the ability to respond to referential pointing may be a generalisation from the directionality of their searchlight beam echolocation signal which illuminates remote objects sonically, a behaviour also associated with a linear posture of the rostrum pointing at the echolocated object. He also cites some recent evidence (Xitco & Roitblat, 1996) that one dolphin can identify the target being sonically illuminated by another dolphin. If this is correct, it would add weight to the suggestion of Noble and Davidson (1996) that in the human case referential pointing at objects may have evolved from the ability to aim weapons at the same target which a group of hunters would need to be able to do in order to dispatch a large prey.
22. One can tell a similar story with respect to the other component of the language of gesture, the mimed movement used to indicate an action to be performed by the sign-recipient. This ability, as we have seen, is already present in the repertoire of chimpanzees. It follows that this use of gesture must have evolved before the bloodline leading to the chimpanzee diverged from that leading to the hominids and ultimately to homo sapiens, and thus before the emergence of any substantial technological development. Nevertheless, it is still possible to argue that the relevant mutation was selected in the first instance by virtue of the contribution which the ability to imitate the movements of others makes to the acquisition of a manipulative skill of the type in which chimpanzees readily become proficient.
23. According to a similar argument, the selection of the mutation or series of mutations that have altered the conformation of the human mouth and larynx in such a way as to make vocal speech possible. In this case it seems entirely plausible that this was selected in the first instance, not by virtue of its utility in interpersonal communication, but by virtue of allowing humans to imitate the calls of animals, and thus lure the prey into the traps that the emerging technology has provided (Hewes 1973b, p.19). (The ability to imitate the calls of wild animals seems to have fallen into disuse, at least in those cultures affected by the neolithic revolution and the domestication of animals. That it must once have existed is shown by the ease with which children in our culture learn to make a passable imitation of dog barking, a cat miaowing, a lamb bleating, a cow mooing, a pig grunting, a hen clucking and a cock crowing.)
24. As the Biblical story of the Tower of Babel implies, there is an intimate connection between the initial evolution of human language and subsequent development of mutually unintelligible natural languages on the one hand and the human reliance on technology, rather than the development of physical characteristics, in adapting to a new environment. The precise role of language in the exploitation, communication and development of a new technology is debatable. But, as the Tower of Babel story suggests, the utility which led to its selection was probably connected to technological projects, such as hunting, building traps for large animals or providing shelter where no natural cave was available, which would require the coordinated activity of a number of individuals.
25. The communication systems found in pre-linguistic organisms which have been analysed by the ethologists depend on the propensity to produce and respond to innate releasing stimuli (Tinbergen 1948; 1951). In so far as there is learning involved, it is narrowly constrained to particular contexts and to particular stages in the organism's development or life cycle, as in the well known phenomenon of imprinting (Lorenz, 1935/1957). Unlike these pre-linguistic communication systems which have their human counterparts in our 'instinctive' emotional responses to such things as other people's facial expressions, tone of voice and 'body-language', language proper is a form of learned behaviour. In language, arbitrary stimuli acquire their stimulus function by virtue of social conventions that vary from one natural language to another. Moreover, the evidence of phenomena such as the so-called back channels and the conventions of politeness shows that conformity to these social conventions is acquired, maintained and modified by the same process of operant or instrumental learning which we observe in the acquisition, maintenance and modification of the motor skills of all free-moving living organisms. Were it not so, language would not be able to adapt in the way that it evidently does to new technologies and new environments. Nor could it support the kind of creative thought process that is needed to create and disseminate a new technology. But if, as this evidence suggests, the acquisition of linguistic competence follows the same principles as those found in the learning of animals, we need to explain why it is that they have not developed language also, and why, when they are taught language by humans, even the anthropoid apes seem unable to progress much beyond the level of a human two-year old. In order to explain that, we have to suppose that mutations have been selected in the course of human evolution which make the acquisition of the kinds of associative links involved in the interpretation and production of speech very much easier for humans than it is for members of other species.
26. In order to adapt effectively to its environment, any free-moving living organism with a complex behavioural repertoire must be able to recognise objects and situations of the different kinds which it regularly encounters in its environment, if it is to select from the various behavioural strategies available to it those which suit the particular context and the needs of the moment. An organism which has at its disposal an armoury of behavioural dispositions appropriate to a variety of different circumstances under which an object or situation of a particular kind may be encountered can be said to have a pre-linguistic concept of things of that kind. The stimuli which, by virtue of their regular association with objects and situations of that kind, acquire the ability to bring such a concept into play (if they do not already possess it by virtue of the innate constitution of the nervous system) constitute the natural signs of the presence of things of that kind.
27. The concepts formed by organisms of the complexity of a bird or a mammal which has a rich variety of behavioural strategies at its disposal derive in part from the genetic constitution of the species and are partly acquired by learning from the past experience of the individual concerned. The effect of both processes is to ensure that the various features of the environment, both those that are common to all environments that members of the species are likely to encounter and those that are peculiar to the current circumstances of the individual are classified in a way that yields reliable predictions of the consequences of behaving in one way rather than another. It follows that, despite differences arising from differences in the ecological niche occupied by the species, all living organisms who depend for their survival on being able to conceptualise any problematic stimulus the individual encounters will tend to share a similar conceptual scheme, one which carves nature at its joints. Without such a shared conceptual scheme at its foundation, linguistic communication in which the same arbitrary symbol becomes attached to the same concept for all members of the linguistic community in question could never have developed.
28. Evidence from the study of homesigning in congenitally deaf children who have no access to a regular sign language in their early years (Tervoort 1961; Morford et al. 1993; Morford 1996), from the history of American Sign Language (Frishberg, 1975) and from the history of the development of Chinese pictograms shows that in the development of a system of linguistic communication which is independent of vocal speech, the earliest signs are invariably iconic in the sense that they imitate the visual appearance of the object they represent. We have also had reason to think that the same may have been true of the earliest vocal signs, in this case imitating the sound made by the object referred to. In all these cases the evidence shows a tendency for the sign system as it develops to move away from the iconic towards the use of arbitrary symbols which have no iconic resemblance to what they stand for. The use of arbitrary symbols not only allows for the representation of objects whose features cannot easily be imitated or depicted; it also eliminates the possibility of misunderstandings due to focusing on aspects of the image presented which are irrelevant to the communicator's purpose. In the face of this evidence, the conclusion that a similar progression from the iconic to the symbolic must have taken place in the evolution of language as a whole, is irresistible. However, it has been suggested to me by Heng-syung Jeng and, indirectly, by Robbins Burling (personal communications) that symbolic representation could only have developed in the course of evolution with the emergence of a form of vocalisation which included consonants as well as vowels. It is argued that with vowels alone vocalisation is restricted in its representation of objects and events to an analogue imitation of the sounds that they make. With consonants it becomes possible to generate a string of discrete digital units (syllables and words) to which any arbitrary significance can be attached.
29. Recent research within the behaviour-analytic tradition on the formation of stimulus equivalence classes (Sidman 1971; Sidman & Tailby 1982; Sidman 1986; 1990), though difficult to interpret clearly and uncontroversially, is evidently exploring a fundamental process whereby arbitrary stimuli acquire symbolic function. Suppose that a three year old child is taught, when presented with an arbitrarily selected stimulus A (the sample), to select from a group of comparable stimuli (the comparisons) another arbitrarily selected stimulus B (the target). If, after this initial training, it is presented with stimulus B and asked to select from a group of stimuli which includes A, it will spontaneously pick A. A child who spontaneously generalises in this symmetrical way is said to have formed a stimulus equivalence class whose members in this case are A and B (strictly speaking, equivalence also entails reflexivity or identity matching, as when A is matched to A, and transitivity, as when, if A is matched to B and B is matched to C, A is then matched to C). The schematic below, adapted from Sidman (1990, Figure 1), illustrates the formation of a stimulus equivalence class using the matching to sample procedure. In both the top and the bottom diagrams, the sample is in the centre of the cross and the comparisons are in the arms:
Figure 1. Formation of a stimulus equivalence class using the matching to sample procedure (Sidman 1990, Figure 1) (shown schematically below and in URL above).
dog(pic) car(word) car(pic)
dog(word) car(pic) car(word)
30. It is proposed that in previous linguistic training the behaviour of picking (nota bene) the drawing of a car (the target in the case of the upper figure), when presented with the written word CAR as sample, and the behaviour of picking the written word CAR (the target in the case of the lower figure), when presented with the drawing of a car as sample, have both been reinforced. As a consequence these two stimuli are treated as equivalent and are said to have become members of the same stimulus equivalence class. Although in this particular example the equivalence class has been formed by linguistic training outside the laboratory, there is a wealth of experimental evidence showing that in human children and adults any two arbitrarily selected stimuli can be formed into an equivalence class by this procedure. Despite the fact that the experimental technology tends to restrict research to the investigation of arbitrary associative links between static visual shapes (but see Sidman  Figure 4.2, p.95, for an experiment in which the sample is spoken word), it is generally agreed by workers in this field that the ability to form stimulus equivalence classes in this sense is intimately associated with the early stages of language development in the human infant.
31. According to the view endorsed here (Place 1995/6), an arbitrary response-produced stimulus becomes a symbol for or name of some individual object or kind of object, property, relation or event when, as illustrated in the schematic, it becomes a member of a stimulus equivalence class which includes amongst its members one or more natural signs of the presence of the individual or kind which it thereby symbolises. The propensity of the child that is developing language to form such stimulus equivalence classes is seen as a result of having repeatedly learned both, as speaker, to produce the symbol or name in the presence of a natural sign of the thing it `stands for' and, as listener, to pick out the natural sign when presented with the symbol or name.
32. Despite many attempts to do so, there is no convincing evidence that any animal species, including apes who have been taught to use sign-language or other symbols, has spontaneously developed a stimulus equivalence class in the way human children invariably do, i.e., unless the individual has been specifically trained to respond to each of the possible combinations of sample and comparison. There is evidence, moreover (Beasty 1987; Dugdale & Lowe 1990; Horne & Lowe 1996), which links the emergence of spontaneous stimulus equivalence class formation with the use of names to distinguish the stimuli the child is learning to associate. Animals who have been taught to use symbols not only fail to show the spontaneous formation of stimulus equivalence classes. They also fail to show the exponential increase in vocabulary size which has been referred to as the naming explosion and which would seem to begin in the child at about the same time (around the age of two). It seems that a mutation has been selected which gives human beings the ability to form the kind of associations involved in giving significance to arbitrary symbols far more readily than any other species.
33. Because the work that has been done on the formation of stimulus equivalence classes has focused on static visual stimuli, it is directly relevant only to the acquisition of object-names. How action-names are acquired has not been studied from this perspective. However, it seems likely that this ability grew out of the ancient practice of representing actions by mimed movement, just as the ability to acquire object-names appears to have grown out of the practice of pointing at objects in order to establish reference to them. Once object-names and action-names have been acquired it becomes possible to construct sentences in what Bickerton (1990) has called proto-language in which sentences consist of an object-name or noun specifying the agent, an action-name or verb specifying the action to be performed, and a second object-name or noun specifying the manipulandum. Horne and Lowe's (1996) 'Daddy push car' is a typical example of just such a sentence. Apart from the distinction between verb and noun and the order in which the different components of the argument structure occur, such sentences are devoid of syntax. Nevertheless, within their limitations, they provide the rudiments of a working symbolic language.
34. As language develops in the child and as it presumably developed in the species, reference is initially restricted to objects in the current common stimulus environment of sign-producer and sign-receiver to which the sign producer refers by pointing at them. With the introduction of iconic representation reference is extended to objects which are absent from the common stimulus environment of both speaker and listener, but only in so far as either their shape can be depicted by means of a mimed movement or their sound can be vocally imitated. With the introduction of symbolic representation reference is extended to absent objects, both individuals and kinds, to which a name has been assigned by the conventions of the language. With the introduction of syntax, particularly with the introduction of embedded clauses, it becomes possible to refer to absent objects by description.
35. Before proceeding to detailed reconstruction of the evolution of language based on these principles, it will be helpful to review some of the evidence which supports the view that the freeing of the human forelimb from its locomotor functions, and the consequent development of manipulative skills, is as important for the evolution of language as it clearly is for the evolution of technology. The following pieces of evidence are relevant in this connection:
36. Many birds have a vocal apparatus as good as that of humans; yet they have not developed language. This suggests that the crucial difference between birds and humans in this respect may be that, while both are bipedal, the forelimbs of birds are still specialised for locomotion, rather than, as in the human case, for manipulation.
37. The occurrence of gesticulation as an invariable accompaniment of speech strongly suggests that gesticulation had a much more important role in the early stages of language evolution.
38. Whenever vocal communication is blocked, either because it cannot be heard or, if heard, cannot be understood, human beings of every culture invariably fall back on gesticulation.
39. The ease with which the deaf learn sign-language, particularly if brought up in an environment in which signing is in constant use by others, and the spontaneous development of homesigning by those who are not, suggests that the ability to use and respond to manual signs is an integral part of our human linguistic heritage.
40. The practice of pointing with the index finger as a way of establishing reference to objects in the common stimulus environment of speaker and listener is a linguistic universal which by common consent plays an essential role in the acquisition of word-meanings.
41. The earliest form of sentence seems to have been one in which the function (action) is indicated by means of a mimed movement and the arguments by pointing at the objects concerned. Communication which relies exclusively on sentences of this type constitutes a language of gesture (Piaget 1926/1932; Hewes 1973a; 1973b; 1976) on which human beings invariably fall back when vocal communication is blocked.
42. The concentration of areas specialised for language in the same hemisphere of the cerebral cortex as that which controls the hand which is preferred for precise manipulative tasks demonstrates the intimate connection between the two functions (Cf. Hewes 1973b, p.9).
43. There is a part of the human cerebral cortex, the angular gyrus on the dominant (usually left) hemisphere, which is specialised for deciphering linguistic stimuli in the visual modality (Thompson 1993, pp.399-402). Since writing and reading have developed far too recently and are still far from universal human accomplishments, the need to decipher a written text cannot explain the development of this ability to process visually presented linguistic signs. It must have been selected, probably before the development of speech, to facilitate the interpretation of a language of gesture.
44. In a recent paper entitled 'Language in our grasp', Rizzolatti and Arbib (1998) have reached a similar conclusion in the light of evidence that Broca's area in the human left frontal cortex, long known as the area involved in the production and interpretation of syntactically articulated sentences, is homologous with an area in the monkey's brain (F5) where neurons (mirror neurons) have been found which respond both to the production of visually-controlled hand-movements and to the visual perception of the corresponding movements when made by others. Although Rizzolatti and Arbib do not make these points, it is evident that this link between the execution of a voluntary hand-movement and the visual perception of similar movements made by others is (a) a by-product of the visual feedback-control of voluntary movement, (b) the foundation for the ability to imitate the hand-movements of others without which a human technology based on the manufacture and use of tools would have been impossible, and (c) the foundation of the ability found, as we have seen, in chimpanzees to communicate by miming the action to be performed by the sign-receiver.
45. No one would seriously dispute the claim that the earliest form of counting consisted in the practice which is found in every human culture of counting up to ten on the fingers of the two hands, and displaying the result to others by holding up the relevant number of fingers. This practice can, perhaps, be seen as an outgrowth of the ability to refer to objects by pointing at them. But since what is pointed at are the fingers rather than the objects being counted, this form of counting is an iconic representation of the number of the objects. Furthermore since you can only count things of a kind, counting presupposes a pre-existing ability to classify objects into kinds and, in the case of communicating the results of a count, a pre-existing ability to indicate the kind of object being counted. (I am indebted to Professor Robbins Burling of the University of Michigan [personal communication April 1998] for convincing me that, unlike vocal counting which is inevitably symbolic from the outset, digital counting, together with some written number systems such as the Roman before the practice of writing IV instead of IIII was introduced, is a form of iconic rather than, as I had previously thought, a form of symbolic representation.)
46. The recent work on the process whereby arbitrary response-produced stimuli become symbols for (names of) objects, described in section II.XI above, argues for a key role in this process for the manual responses of pointing at and picking out the relevant stimuli.
47. Little appears to be known about the process whereby action-names are learned. What is known (Koehler 1921/1927) is that chimpanzees communicate what they want a conspecific to do by miming the action in question, and such miming is a conspicuous feature both of the gesticulation that invariably accompanies speech, unless the hands are otherwise engaged and of sign-languages, whether officially recognised or devised by the individual. This suggests that miming of the action by the caregiver and its imitation by the child must play a key role in the acquisition of action-names.
48. In the light of this evidence and the principles outlined in sections I and II above, I would propose the following scenario for the evolution of what I am suggesting is the sequence of stages involved in the evolution of language:
49. The first stage in the evolution of language appears to have occurred at a time when chimpanzees and humans had a common ancestor. Three interconnected abilities would seem to have developed at this stage: (a) the ability to use sticks and stones as tools and weapons, (b) the ability to imitate the movements of others in the context of learning to perform the manipulations involved in the effective use of tools and weapons, and (c) the ability to communicate what one wants someone else to do by miming the action required. There is some reason to think that the concentration of the manipulative and communicatory functions in one hemisphere of the cerebral cortex (the left in those who are right-handed) may have begun at this stage, perhaps with the specialisation of such structures as the angular gyrus and the pre-motor cortex in the dominant hemisphere for the visual interpretation of hand-movements in general and gesture in particular.
50. The second stage culminates in the emergence of the first true sentences formulated in the language of gesture. It begins with the emergence of the practice of pointing referentially at objects, at the individual who is to do something, at an object to be manipulated, at a destination or location to which the individual is to move or to which the object is to be moved. As we have seen (Section II.IV above), this ability is lacking in chimpanzees, not because it is something they cannot learn to do, but because referential pointing is something to which, unlike dolphins, they cannot learn to respond. We have also seen reason to agree with Noble and Davidson's (1996) suggestion that this ability may have evolved with the development of the ability of a group of hunters to aim their weapons at the same target. Given the ability to use pointing to distinguish (a) who is to perform the action, (b) the object to be manipulated and (c) the individual to whom the object is to be transferred, it becomes possible for the first time in the history of communication between living organisms to construct novel sentences in what may be justly described as the language of gesture in which different mimed actions are combined with different combinations of argument (agent, object and recipient) identified by pointing at them.
51. (Dr. Marina Sbis of the Department of Philosophy, University of Trieste [personal communication, June 1998], has drawn my attention to the fact that human infants frequently indicate the object to be manipulated by an adult, in the case of a small portable object such as a bowl, by bringing it to the adult or, in the case of a larger object by dragging the adult towards it. It is not clear to me whether this behaviour is part of the miming of the action to be performed which is already present in the behaviour of chimpanzees or whether it is a separate development, possibly connected to the technology of using containers to collect, store and distribute liquids such as water and milk and solids such as fruit and grain.)
52. But significant though it is, the practice of referring to objects by pointing at them is severely limited in its scope. Whereas mimed action allows the communicator to refer to what has not yet occurred, the action to be performed by the respondent, referring to objects by pointing at them allows the communicator to refer only to conspicuous objects in the stimulus environment of both parties. The effect of subsequent developments is to increase that scope beyond what is indexically present.
53. In Stage 3 vocalisation is added to the language of gesture. It depends on changes to the conformation of the mouth and larynx which are selected in the first instance by their effect in allowing human beings to imitate the sounds made, for example, by the male or female of the species to attract a potential mate, thereby enticing the latter into the traps which the technology provides. Once established such calls are introduced into otherwise gestural sentences as an alternative to pointing at instances of the object where no such instance is present. Since there is no obvious trace of the kind of iconic gestures used by homesigners to represent objects (Morford et al. 1993) in the gesticulations of those without auditory impairment, I am inclined to think that the vocal imitation of sounds made by animals were the first iconic representations of objects, as distinct from the iconic representations of actions by means of mimed actions which have been used since the days of our ape ancestors to represent the action to be performed by the sign- recipient. They make it possible for the first time to talk about absent objects as well as actions not yet performed.
54. The position of this fourth stage in the sequence of evolutionary events leading to fully developed language is unclear. It is placed here because it can be plausibly seen as the first step in the move away from the iconic towards a symbolic system of representation. It is the development of the ability to count up to ten on the fingers of the two hands and communicate the result by holding up the appropriate number of fingers. Considered as a representation of the number of objects in a group, holding up the corresponding number of fingers may be considered iconic. But, once they progress beyond the number of fingers on the two hands, counting systems inevitably become symbolic. Vocal counting is invariably symbolic from the outset.
55. In Stage 5 the first representations of objects using arbitrary symbols (names) begin to appear. Once the use of symbols is well established in the repertoire of a human child, all that is required for the child to learn a new name or other lexical word is for the instructor to point to one or two instances of the kind of object the word is used to refer to while uttering the word in question. However, the evidence reviewed in Section II.XI above suggests that in its early stages learning the names of things is a much more complex process, one in which there is reinforcement both of the response of producing the name in the presence of an instance of the kind in question and the response of picking an instance of the kind in the presence of the name. Although apes, and possibly members of other animal species, can be taught to use symbols, they never progress to the point where there is spontaneous generalisation in both directions between the word or symbol and the natural signs of the presence of the object for which it stands. To be able to learn word-meanings as easily as a human child does from about the age of two requires a mutation which has occurred and been selected only in the human species.
56. Apes who have been taught sign language or some other form of symbolic communication can construct sentences in what Bickerton (1990) calls proto-language. But without the rapidly expanding vocabulary that seems to develop only with the spontaneous emergence of stimulus equivalence classes, language can never take off as it does in the human child. Even so, consisting as they do entirely of names (lexical words), proto-language sentences have no syntax other than the verb/noun distinction. That, and perhaps some of the other distinctions that are later drawn by means of syntax, are indicated by gesture which, at this stage, still forms an integral part of the process of linguistic communication. This is the first stage in the evolution of language where the increased efficiency of language as a medium for interpersonal and intrapersonal communication is unquestionably what determines the selection of the mutation that provides it, rather than its utility in relation to some purely technological adaptation. It is at this stage presumably that Wernicke's area evolves as a centre for the interpretation and production of names. With the development of symbols (proper names) referring to particular persons and places, unambiguous reference to individuals in their absence becomes possible for the first time.
57. As argued in Section II.III above, the developmental evidence suggests that the first sentences produced and responded to by our ancestors in the course of language evolution were all imperatives. It also seems likely that the earliest declarative sentences were answers to questions and that questions and answers evolved simultaneously as part of a single practice. As in the case of counting, it is unclear at what stage in the evolution of language this development took place. The best guess is that it was associated, as it seems to be in children, with the so-called naming explosion which occurs around the age of two or three and consists in a rapid increase in the child's vocabulary, particularly the names of kinds of object. This event appears to coincide with the child's discovery of the practice of asking questions of the caregiver, particularly questions about the names of things, a practice which, like the naming explosion it triggers, seems to be absent from the behaviour of the most intelligent of those apes who have been taught a form of sign-language.
58. The development of syntax is the final stage in the evolution of language. It is selected by virtue of its effect in releasing linguistic communication from dependence on the listener's paying attention to the context of utterance and the gestures of the speaker in order to disambiguate what a speaker is saying. It thus allows speakers to talk intelligibly about situations which are not part of the current stimulus environment of either speaker or listener, whether in the past, in the future or at some place geographically remote from the context of utterance. Once it is fully developed, gesture, though still a valuable aid to the speaker's eloquence, ceases to perform any essential communicatory function as far as the listener is concerned. But, if gesture itself has been made redundant for all but the deaf by the introduction of syntax, it seems that the connection between language and manual and other forms of motor skill still survives in the remarkable parallel to which Horgan and Tienson (1996) have drawn attention between the syntactic organisation of sentences and the syntactic (no metaphor) organisation of a motor skill such as basket-ball playing.
59. It is an open question whether syntax evolved, as Chomsky would have us believe, through a single mutation, or whether the emergence of each class of syntactic operator required the selection of a separate mutation. In favour of the former view is the existence of a single area in the human cerebral cortex, Broca's area, which is specialised for its interpretation and production, damage to which appears to affect all types of syntactic operator more or less equally (Thompson 1993, p.398). In favour of the latter view is the observation that the order in which the different classes of syntactic operator are acquired by the child is a linguistic universal (Slobin 1985; Aitchison 1989). With the introduction of syntax, particularly the definite article and the relative clause, it becomes possible for the first time to refer to absent objects by description as well as by proper name.
I am indebted for their stimulating comments and for additional references to Bernard Bichakjian, Paul Bloom, Rob Burling, Annabel Cormack, Tom Dickins, Heng-syung Jeng, Harry Jerison and Jill Morford. Since I have not otherwise cited his work, I should also express my indebtedness to Lev Vygotsky's (1934/1986) Thought and Language to which, among other things, I am indebted for the crucial references to Koehler, Piaget and Wundt.
Aitchison, J. (1989) The Articulate Mammal: An Introduction to Psycholinguistics, Routledge.
Barwise, J. & Perry, J. (1983) Situations and Attitudes, MIT Press.
Beasty, A. (1987) The role of language in the emergence of equivalence relations: A developmental study. Unpublished Ph.D. thesis, University of Wales, Bangor, U.K.
Bickerton, D. (1990) Language and Species. University of Chicago Press.
Chomsky, N. (1957) Syntactic Structures. Mouton.
Chomsky, N. (1965) Aspects of the Theory of Syntax. MIT Press.
Condillac, B. de (1746/1947) Essai sur l'origine des connaissances humaines, ouvrage ou l'on rduit un seul principe tout ce concerne l'entendement. In: Oeuvres Philosophiques de Condillac. Paris: Georges LeRoy.
Dugdale, N. & Lowe, C.F. (1990) Naming and stimulus equivalence. In: Behaviour Analysis In Theory and Practice: Contributions and Controversies, ed. D. E. Blackman & H. Lejeune. Erlbaum.
Fodor, J. (1975) The Language of Thought. MIT Press.
Frishberg, N. (1975) Arbitrariness and iconicity: Historical change in American Sign Language. Language 51:696-719.
Greenfield, P. M., & Savage-Rumbaugh, E. S. (1990). Grammatical combinations in pan paniscus: processes of learning and invention in the evolution and development of language. In: Language and Intelligence in Monkeys and Apes: Comparative Developmental Perspectives, ed. S. T. Parker & K. R. Gibson. Cambridge University Press.
Harzem, P. & Miles, T. R. (1978) Conceptual Issues in Operant Psychology. Wiley.
Herman, L. M. (1987). Receptive competencies of language-trained animals, In: Advances in the Study of Behaviour, ed. J. S. Rosenblatt, C. Beer, M.C. Busnel, & P. J. B. Slater. Academic Press.
Herman, L. M. (1998) The dolphin's grammatical competency: Comments on Elements of Syntax in the Systems of Three Language-Trained Animals, E. Kako. Animal Learning and Behaviour.
Herman, L. M. Kuczaj, S. A. & Holder, M. D. (1993). Responses to anomalous gestural sequences by a language-trained dolphin: Evidence for processing of semantic relations and syntactic information. Journal of Experimental Psychology: General 122 (2):184-194.
Herman, L. M., Pack A. A. & Morrel-Samuels, P. (1993). Representational and conceptual skills of dolphins, In: Language and Communication: Comparative Perspectives, ed. H. R. Roitblat, L. M. Herman & P. Nachtigall. Erlbaum.
Herman, L. M., Richards, D. G. & Wolz, J. P. (1984) Comprehension of sentences by bottlenosed dolphins. Cognition 16:129-219.
Hewes, G. W. (1973a) An explicit formulation of the relationship between tool-using, tool-making and the emergence of language. Visible Language 7:101-127.
Hewes, G. W. (1973b) Primate communication and the gestural origin of language. Current Anthropology 14:5-24.
Hewes, G. W. (1976) The current status of the gestural theory of language origins. In: Origins and Evolution of Language and Speech, ed. S. R. Harnad, H. D. Steklis, & J. Lancaster. New York Academy of Science.
Horgan, T. & Tienson, J. (1996) Connectionism and the Philosophy of Psychology. MIT Press.
Horne, P. J. & Lowe, C. F. (1996) On the origins of naming and other symbolic behaviour. Journal of the Experimental Analysis of Behaviour 65:185-241.
Jhannesson, A. (1949) Origins of Language: Four Essays. Leiftur.
Jhannesson, A. (1950) The gestural origins of language, Nature 166:60-61.
Koehler, W. (1921/1927) Intelligenzprfungen auf Menschenaffen. Springer. English translation by E. Winter as The Mentality of Apes, 2nd Ed. Routledge & Kegan Paul.
Liberman, A. M. (1993) Haskins Laboratories Status Report on Speech Research 113:1-32.
Lorenz, K. (1935/1957) Der Kumpan in der Umwelt des Vogels; die Artgenoe als auslsendes Moment sozialer Verhaltungsweisen, Journal of Ornithology, 83:137-213 & 289-413. English translation as `Companionship in Bird Life: Fellow Members of the Species as Releasers of Social Behaviour' In: Instinctive Behaviour, ed. C. H. Schiller. International University Press.
Morford, J. P. (1996) Insights into language from the study of gesture: A review of research on the gestural communication of non-signing deaf people. Language and Communication 16:165-178.
Morford, J. P., Singleton, J. L. & Goldin-Meadow, S. (1993) The role of iconicity in manual communication. In: K. Beals, G. Cooke, D. Kathman, S. Kita, K.E. McCullough & D. Testen, Papers from the Chicago Linguistic Society 29, Vol 2: The Parasession:243-253.
Morgan, L. H. (1877) Ancient Society. Holt.
Noble, W. & Davidson, I. (1996) Human Evolution, Language and Mind: A Psychological and Archaeological Inquiry. Cambridge University Press.
Pepperberg, I. M. (1987). Interspecies communication: A tool for assessing conceptual abilities in the African Grey parrot. In: Cognition, Language and Consciousness: Interactive Levels, ed. G. Greenberg & E. Tobach. Erlbaum.
Piaget, J. (1926/1932) The Language and Thought of the Child, 2nd Ed. Routledge & Kegan Paul.
Place, U. T. (1995/6) Symbolic processes and stimulus equivalence. Behaviour and Philosophy, 23/24:13-30.
Povinelli, D. J. & Davis, D. R. (1994). Differences between chimpanzees (Pan troglodytes) and humans (Homo sapiens) in the resting state of the finger: implications for pointing. Journal of Comparative Psychology, 108:134-139.
Rizzolatti, G. & Arbib, M. A. (1998) Language within our grasp. Trends in Neuroscience, 21:188-194.
Romanes, G. J. (1888) Mental Evolution in Man: Origin of Human Faculty, Kegan Paul.
Savage-Rumbaugh, E. S. (1986). Ape Language: From Conditioned Response to Symbol, Columbia University Press.
Schusterman, R. J. & Gisiner, R. C. (1988) Artificial language comprehension in dolphins and sea lions: The essential cognitive skills. The Psvchological Record 38:311-348.
Schusterman, R. J. & Krieger, K. (1984) California sea lions are capable of semantic comprehension. The Psvchological Record 34:3-23.
Sidman, M. (1971) Reading and audio-visual equivalences. Journal of Speech and Hearing Research, 14:5-13.
Sidman, M. (1986). Functional analysis of emergent verbal classes. In: Analysis and Integration of Behavioural Units, ed. T. Thompson & M. D. Zeiler. Erlbaum.
Sidman, M. (1990). Equivalence relations: Where do they come from? In: Behaviour Analysis in Theory and Practice: Contributions and Controversies, ed. D. E. Blackman & H. Lejeune. Erlbaum.
Sidman, M. & Tailby, W. (1982) Conditional discrimination vs. matching to sample: an expansion of the testing paradigm. Journal of the Experimental Analysis of Behaviour, 37:5-22.
Skinner, B. F. (1938) The Behaviour of Organisms. Appleton-Century.
Skinner, B. F. (1957) Verbal Behaviour. Appleton-Century-Crofts.
Slobin, D. I., ed. (1985) The Crosslinguistic Study of Language Acquisition, 2 vols. Erlbaum.
Tervoort, B. T. (1961) Esoteric symbolism in the communication behaviour of young deaf children. American Annals of the Deaf, 106:436-480.
Thompson, R. F. (1993) The Brain: A Neuroscience Primer, 2nd Ed. Freeman.
Tinbergen, N. (1948) Social releasers and the experimental method required for their study. Wilson Bulletin 60:6-52.
Tinbergen, N. (1951) A Study of Instinct, Clarendon Press.
Tylor, E. B. (1868) On the origin of language. Fortnightly Review, 1:22.
Tylor, E. B. (1871) Primitive Culture. John Murray.
Vygotsky, L. (1934/1986) Thought and Language. English translation by A. Kozulin. MIT Press.
Wallace, A. R. (1881) Review of Anthropology by Edward B. Tylor. Nature 24:242-245.
Wallace, A. R. (1895) Expressiveness of speech, the mouth gesture as a factor in the origin of language. Fortnightly Review 64:528-543.
Wundt, W. (1900) Vlkerpsychologie, Vol. I: Die Sprache. Engelmann.
Xitco, M. J. & Roitblat, H. R. (1996). Object recognition through eavesdropping: passive echolocation in bottlenose dolphins. Animal Learning and Behaviour 24:355-365.