H. M. Collins (1997) The Editing Test for the Deep Problem of AI. Psycoloquy: 8(01) Turing Test (8)

Volume: 8 (next, prev) Issue: 01 (next, prev) Article: 8 (next prev first) Alternate versions: ASCII Summary

Topic:

PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).

Psycoloquy 8(01): The Editing Test for the Deep Problem of AI

THE EDITING TEST FOR THE DEEP PROBLEM OF AI
Commentary on Watt on Turing-Test

H. M. Collins
Centre for the Study of Knowledge, Expertise & Science
Department of Sociology and Social Policy
University of Southampton
Southampton SO17 1BJ UK
+44 (0)1703 592578 FAX 593859

h.m.Collins@soton.ac.uk

Abstract

All the problems of AI are surface transformations of one deep problem: how to make a computer that can learn from its surroundings, including social surroundings, in the same way that humans learn. The Turing Test can be adapted to check whether or not the deep problem has been solved by looking at one of its surface transformations -- the problem of "interpretative asymmetry." Interpretative asymmetry refers to the skillful way in which humans "repair" deficiencies in speech, written texts, handwriting, etc., and the failure of computers to achieve the same interpretative competence. Short passages of typed text are quite enough to reveal interpretative asymmetry, and therefore a Turing-like test, turning on the differential ability to sub-edit such short passages, is sufficient to reveal whether the deep problem of AI has been solved.

Keywords

False belief tests, folk psychology, naive psychology, the "other minds" problem, theory of mind, the Turing test.

1. I will respond to only one point made in the target article (Watt, 1996) -- the question of interpretative asymmetry, and my response will be self-indulgent; I think there is a much easier way to approach these problems and I will explain what it is.

2. Interpretative asymmetry is a way of describing the difference between what computers can do and what humans can do during an interaction. Humans have the ability to "repair" deficiencies or variation in the output of computers whereas current computers cannot do anything like such a good repair job. The computer at which I am typing this response can communicate with me in all kinds of symbolic forms such as short warnings or questions printed to the screen (usually consisting of less than a full grammatical sentence), icons, or menus with abbreviated two or three word phrases. If a new application carries a different phrase or icon covering the same operation -- e.g. saving a file -- it will not cause me any trouble; indeed, I may not even notice it. Worse things can happen too: sometimes, for example, when I am writing an email, something goes wrong and the screen presents me with a mixture of what I want to write and a load of meaningless hieroglyphs. Nevertheless I can extract the sense from the hieroglyphs without too much effort and continue to type my message with only a little strain.

3. One of the things that screen icons or menus do is reduce the amount of intepretation that the computer has to do. The icons or menus, digitise my input. They restrict what I can input to a small number of fixed and unambiguous possibilities. This makes the computer's interpretative incompetence invisible. Keyboard input does something of the same thing but at a lower level; it restricts my input to strings in which each place can take only around a couple of hundred unambiguous symbols.

4. The keyboard input does allow the possibility of mistakes, however, and when I give instructions to my computer I have to restrict what I type. Suppose I am working in DOS: I have to type my instructions with exact spelling and syntax; I cannot decide to type "copy" one day and "transfer" the next; I cannot type "cpoy" or "please copy the following file." If the keyboard was not there to digitise input as far as it does, the possibility and probability of mistakes would be even greater. Handwriting input devices allow far greater freedom of variation in input but they do not work very well. Even optical character readers which work on typed texts continue to make mistakes.

5. To repeat, I routinely deal with these kinds of variations when different computers or packages output them to me. I can also deal faultlessly with typed text and almost faultlessly with handwritten output from other humans so long as they are meant to represent my own language. Thus there is an asymmetry in the relationship between humans and computers. That is "interpretative asymmetry".

6. Now it may be said that spell checkers are improving, that optical character readers are getting better all the time, and that even handwriting interfaces are being perfected, so that the problem of interpretative asymmetry is being solved. But that is the question not the answer -- Is interpretative asymmetry going away?

7. I suggest that interpretative asymmetry is a very deep problem rather than "just" a matter of the design of better interfaces. This is because the resources humans bring to the problem of resolving mistakes or ambiguities include the full range of capabilities we have aquired through being embedded in societies and linguistic communities. The problems of artificial intelligence, I would argue, all amount to one problem -- how to make a computer that can learn from its surroundings in the way that we humans learn from our surroundings, including our social surroundings; the ability to learn language is just one surface transformation of this foundational problem (Collins, 1990). Solve one of these transformations and you have solved the deep problem -- the rest will be research and development.

8. The directions that modern AI research has taken -- neural nets and robots intended to learn from their surroundings -- are the beginnings of a research program which might lead to computers being as capable as we at learning from our surroundings. It is useful to have in mind a simple test which, if passed, would represent successful achievement of this goal. Such a test will help researchers to avoid the sorts of premature announcement of "breakthroughs" that have been made in the past and this in turn should aid the developers of commercial and other other applications to be less likely to misdirect their efforts -- as in the premature development of handwriting interfaces and so forth.

9. The Turing Test can be used to examine one of the surface transformations of the deep problem -- linguistic interpretative asymmetry. As it is normally thought about, the Turing Test is not set up to look for interpretative asymmetry but some looser notion of "language understanding". But many of the problems of "language understanding" can be circumvented by "tricks" and this leads to problems of interpretation of the meaning of the Turing Test. Setting up the Turing Test to look for interpretative asymmetry is, however, very easy -- one just makes it a test of sub-editing ability. My claim is that a computer that can pass the sub-editing test is one whose designers have cracked the deep problems of AI -- there is no need to extend the test to robotic abilities (Harnad, 1989) or design a complicated protocol such as that suggested by Watt (1996). All one needs to set up the editing test is to pick a suitable control, and an ingenious interrogator. To be fair to the humans, control and interrogator must have cultural contiguity; to be fair to the computer it must be expressly designed to pretend to be a member of the social group shared by the two humans in the test -- we cannot expect any computer to have shared all cultural milieux just as we do not expect this of humans (see Collins, 1990 for the development of this argument). We then check to see which of computer and control makes a better job of sub-editing certain passages of incorrect english presented by the interrogator which control, computer, and program designer, have not seen before. The interface technology for the test is already available -- one could use keyboard input to both parties or give the computer an optical character reader and printed text. Output could be to a screen or a printer. There is no need to develop any new transducers to make the test viable.

10. The interrogator -- perhaps aided by collaborators -- would have to make up appropriate and novel texts for sub-editing such as would test computer and control to the full. There is nothing interesting about having undemanding texts subedited. Though the terms "test to the full" and "undemanding" are somewhat loose, the idea is straightforward: one must think up texts which one would not expect an entity to be able to sub-edit unless it had the linguistic comptence of someone who was a full member of the linguistic community. Here are a couple of examples (their presentation here rules them out for use in a real life Turing Test which must use unseen examples):

  EXAMPLE 1.

     Mary: The next thing I want you to do is correctly
           spell a word that means a religious ceremony.

     John: You mean rite.  Do you want me to spell it out
           loud?

     Mary: No, I want you to write it.

     John: I'm tired. All you ever want me to do is write,
           write, write.

     Mary: That's unfair, I just want you to write, write,
           write.

     John: OK, I'll write, write.

     Mary: Write.

  EXAMPLE 2.

     My spell-checker will correct weerd processor but won't
     correct weird processor or world processor.

     Thus there will always be a problem with weird
     processors when it comes to spelling correctly.

     To tell you the truth, sometimes when you see how they
     correct spellings one does not know how to think of
     them. Are they really weird processors or weird
     processors?

Notice, that in this example, nothing must be changed in the first paragraph, "weird" should be changed to "word" in the second paragraph, while in the third paragraph one or other of the "weirds" has to be changed, but not both of them.

11. It is hard to foresee how to program a computer to edit unseen passages such as these. Let us look at a few reasons why. First, suppose we approached the problem with a look up table in mind (such as is found in Searle's Chinese Room). Suppose, for argument's sake, each passage were around 14 lines long -- that is, each contained about 1,000 symbols. Each symbol, if written in English, could take, perhaps, 100 values (there are a few less on my typewriter keyboard but we might as well keep the arithmetic easy). What follows is that the number of possible combinations of symbols I can type which make up symbol-strings each 1,000 symbols long is 100 raised to the power of 1000 or 10 raised to the power of 2000. How many of these passages would be suitable candidates for a sub-editing test is not so easy to work out, but it might be quite a sizeable subset as can be seen from considering the following potential passage:

     Jim, when he was a child, invented what he called a new
     "language" consisting entirely of strings of five x's.
     For example, a passage of this language might look as
     follows: xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx

     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xyxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
     xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx.  
     Some language! or xyxxxx xxxxz! as Jim would say.

Thus, the possibility of self reference seems to allow that almost any list of symbols can count as meaningful and this makes the list of possible passages for subediting almost as large as the possible list of combinations of symbols -- 10 raised to the power of 2000. Given that the number of particles in the universe is around 10 raised to the power of 120, this seems to rule out a look-up table approach.

12. It may be, of course, that all variations can be subsumed under relatively small set of rules so that the look-up table is beside the point, but even if this were the case there is still the problem that legitimate usages in a language change over time. This means that for any isolated sub-editing program to remain current, its program would have to be continually updated by a programmer who was connected to the social group that was the source of the changes in the language which the program must reflect. But in this case, the program and sub-editor would amount to a very elaborate conduit for the programmer's understanding rather than an autonomous entity. To be autonomous the program would have to be able to learn about social and linguistic changes by itself and that is why it would have to have solved the deep problem of AI.

13. To sum up, interpretative asymmetry is one manifestation of the shortcomings of computers when compared to human abilities. Interpretative asymmetry is one of the surface transformations of the deep problem of AI. The solution of the problem of interpretative asymmetry would, therefore, reveal that the deep problem of AI had been solved; the rest would be a matter of research and development for the breakthrough, whatever it is, would have been made. I have set out a very simple test, based on the Turing Test, for finding out whether or not the problem has been solved.

REFERENCES

Collins, H.M., (1990) Artificial Experts: Social Knowledge and Intelligent Machines, Cambridge Mass: MIT Press.

Harnad, S., (1989) "Minds, Machines and Searle", Journal of Theoretical and Experimental Artificial Intelligence, 1, 5-25.

Watt, S. (1996) Naive Psychology and the Inverted Turing Test. PSYCOLOQUY 7(14) turing-test.1.watt.

Volume: 8 (next, prev) Issue: 01 (next, prev) Article: 8 (next prev first) Alternate versions: ASCII Summary

Topic:

THE EDITING TEST FOR THE DEEP PROBLEM OF AI Commentary on Watt on Turing-Test

Abstract

Keywords

REFERENCES

THE EDITING TEST FOR THE DEEP PROBLEM OF AI
Commentary on Watt on Turing-Test