This commentary attempts to show that Watt's (1996) inverted Turing Test could be simulated by a standard Turing Test; indeed, a very simple program with no intelligence whatsoever could be written that would pass the inverted Turing Test. For this reason, the inverted Turing Test must be rejected.
2. Remarkably, the same is true for the Turing Test. All variations of the original Turing Test, at least all of those of which I am currently aware, that attempt to make it more powerful, more subtle, or more sensitive can, in fact, be done within the framework of the original Turing Test. Watt (1996) touches briefly on this fact in recognizing the criticism that "the inverted Turing Test is redundant because all of its power of discrimination is available in the standard Turing Test". But rather than focusing on this crucial criticism, he comments only that "... a critically evaluated standard Turing Test without a time limit would be sufficient to detect the presence of naive psychology. However, given that humans have all these psychological biases in their ascription of mental states, I doubt whether a truly critical version of the Turing Test is psychologically possible without some variation in the test." But what does he mean by "some variation in the test"? He clearly seems to be arguing for a more subtle, more reliable version of the test -- namely, his inverted Turing Test. In what follows I will attempt to show that this "inverted" Turing Test could not only be simulated by the standard Turing Test but, most importantly, would ascribe intelligence to programs that are unquestionably not intelligent.
3. In thinking about the Turing Test, people often tend to overlook the completely unfettered nature of the questions that may be asked by the interrogator. It cannot be overemphasized that any question is fair game. Questions can, for example, be declarative ("What is the capital of Senegal?"), they may be procedural ("Please describe how you would tie your shoes"), or -- and this is crucial -- they may be indirect or subcognitive (French, 1990). Watt, however, says, "It might be possible, with the current state of the art, to use a simple set of linguistic metrics that would unambiguously distinguish between people and computer systems. I would regard this as cheating." But what if this "simple set of linguistic metrics" could be elucidated by the answers to a number of perfectly reasonable questions, as described, for example, in French (1990)? Is this still "cheating"? And how could one ever determine the point at which a technique that took advantage of some underlying "simple set of linguistic metrics" was cheating? Stipulating unambiguously what would or would not constitute cheating would prove to be as impossible as stipulating that questions must be limited to particular domains. The boundaries between real world domains are so overlapping and so fuzzy as make impossible adherence to any such stipulation.
4. The underlying idea of Watt's inverted Turing Test is as follows.
"Instead of evaluating a system's ability to deceive people, we shouldtest to see if a system ascribes intelligence to others in the same way that people do ... by building a test that puts the system in the role of the observer ... [A] system passes [this inverted Turing Test] if it is itself unable to distinguish between two humans, or between a human and a machine that can pass the normal Turing Test, but which can discriminate between a human and a machine that can be told apart by a normal Turing Test with a human observer."
5. Now, if we accept the conclusions of French (1990, 1995) -- as Watt apparently does -- the issue of a machine passing the standard Turing Test will never come up, so we can reduce his description of the inverted Turing Test to the following: Two separate rooms can contain either two human beings or, alternately, a machine and a human. There are two types of interrogators: I(H), a human, and I(M), a machine. If the abilities of I(H) and I(M) to say when the rooms contain two humans or a human and a machine are indistinguishable, we will say that I(M) (the machine) has passed the inverted Turing Test.
6. This does indeed look like an inversion of the standard Turing Test. But to see how the standard version of the test can accommodate this new version, assume that the human-interrogator I(H) and the machine-interrogator I(M) are put in separate rooms. There is a Super-Interrogator, who does not know which room contains the machine and which contains the human interrogator. The Super-Interrogator then says to each interrogator in turn: "From now on, you may assume that you are the interrogator of a Turing Test. Your terminal will be connected to two rooms, which may contain either two humans, or a machine and a human. Your job is to tell me if the pair of rooms contains two humans or a human and a machine." Both interrogators judge the entities (human or machine) in the first pair of rooms to which they are connected. Then they are given a new pair of rooms and make a new judgment, and so on. There is some number of pairs of rooms and both interrogators judge all of them. Since the Super-Interrogator knows the real contents (human-human or machine-human) of each pair of rooms, he can compare the respective performances of the human-interrogator and the machine-interrogator. If their answer profiles are indistinguishable to him, the machine-interrogator will be judged to have passed the inverted Turing Test. Our first conclusion is therefore that the inverted Turing Test can be simulated, with almost no overhead, by a standard Turing Test. (Again, this is analogous to the Turing Machine. The way that variants of the standard TM are shown to be no more powerful than the standard TM is by demonstrating that the new machines can be simulated by the standard TM.)
7. The next question is whether the inverted Turing Test is sufficiently powerful to prevent obviously unintelligent programs from passing it. The answer is that it is not. But before this argument can be made, it is necessary to become familiar with a special type of question -- subcognitive questions -- that can be used by interrogators in Turing Tests and that will allow infallible unmasking of computers as non-humans, unless the computers had lived life as we humans had. A complete discussion of this technique can be found in French (1990) or, in succinct form, in French (1995). Furthermore, Watt in his present article seems to accept the arguments presented in these papers.
8. The most important point of these two articles is the immense (and generally unappreciated) difficulty that anything not having lived life as a human being would have in actually passing it. We humans respond very consistently to "subcognitive" questions (i.e., questions that draw on the subconscious structure of our minds), such as, "Would Flugblogs be a good name for a start-up computer company?" -- Of course not! -- or "Would Flugblogs be a good name for air-filled bags that you could tie on your feet to walk across swamps with?" -- Sure, not bad! Humans' answers emerge from a vast set of learned, associative, and largely unconscious influences involving sounds (Which word is prettier, "farfalletta" or "blutch"? Why, exactly?), connotations (Would you like it if someone called you a "trubhead"? Why, exactly? How could this be explicitly programmed into a machine?), pictures, smells, past events, and so on ad infinitum. These influences are produced by our continual interaction with our environment. And subcognitive questions tap into the results of this lifetime of human-environment interaction. In other words, these questions subtly probe our vast, complex and intricately interconnected associative concept networks that have been learned by experiencing the world. They are precisely the kinds of questions that would unfailingly unmask any computer that had not lived life as we had (French, 1995).
9. So, for the inverted Turing Test, here is what we do. We independently prepare a long list of subcognitive questions. Then we go out into the same population from which the humans participating in the inverted Turing Test will be chosen. We ask them a large sample of questions like these:
"On a scale of 0 (completely implausible) to 10 (completely plausible):
- Rate 'Flugblogs' as a name Kellogg's would give to a new breakfast cereal. - Rate 'Flugblogs' as the name of new computer company - Rate 'Flugblogs' as the name of big, air-filled bags worn on the feet and used to walk across swamps. - Rate 'Flugly' as the name a child might give to a favorite teddy bear. - Rate 'Flugly' as the surname of a bank accountant in a W. C. Fields movie. - Rate 'Flugly' as the surname of a glamorous female movie star. - Rate 'banana splits' as 'medicine'. - Rate 'purses' as 'weapons'. - Rate 'pens' as 'weapons'. - Rate 'dry leaves' as 'hiding places'. - etc."
10. So, for example, no dictionary definition of 'dry leaves' will ever include the fact that piles of dry autumn leaves are wonderful places for children to hide in and, yet, few among us would not make that association upon seeing the juxtaposition of those two concepts. (Is this tapping into some simple linguistic metric? Certainly, the metric born of human experience with the world. Is this cheating? Surely not.) In any event, by surveying the population at large with an extensive set of these questions, we draw up a Human Subcognitive Profile for the population. It is precisely this subcognitive profile that could not be reproduced by a machine that had not experienced the world as the members of the sampled human population had. The list of questions, the Subcognitive Question List, that was used to produce the Human Subcognitive Profile gives an interrogator -- any interrogator -- a sure-fire way to eliminate machines from a Turing Test in which humans are also participating.
11. Now, let us return to our two interrogators, I(M), the machine- interrogator, and I(H), the human-interrogator. First, consider I(H). He will be able to eliminate, using some form or another of his own subcognitive question list, all machines from the running. In other words, he will be able to infallibly determine those pairs of rooms in which there is a machine and a person. Can a completely unintelligent machine-interrogator, I(M), do just as well? Yes, all that is required is for human programmers to equip I(M), the dumbest of programs, with the previously established Subcognitive Question List, the corresponding Human Subcognitive Profile and a small statistics routine. The program will then test each candidate (i.e., one machine and one person) with its Human Subcognitive List and use its statistics routine to compare the results with its Human Subcognitive Profile. The inevitable divergences that any machine being questioned will have with this Profile will infallibly unmask it. But in this case the unmasking has been done not by an intelligent human interrogator but by the stupidest of all machine-interrogators, one containing little more than the canned Subcognitive Question List, the corresponding Human Subcognitive Profile and a statistical analyzer.
12. Similarly, for the pairs of rooms each containing a human. Presumably, I(H) will not be able to tell them apart. And, as before, the machine-interrogator, upon presenting both humans with its canned Subcognitive Question List and comparing the results with its Human Subcognitive Profile, will find that neither individual differs significantly from its Profile, and will, like the human-interrogator, conclude, rightly, that two humans are involved. The performance of the human-interrogator and the (clearly unintelligent) machine-interrogator will be, at least with respect to Watt's inverted Turing Test, identical.
13. Of course, subcognitive question lists represent the Turing Test at its very hardest. But we must assume that Turing Test interrogators are always trying to pose the hardest, most revealing questions possible in their quest to unmask the computer. It is important to note that there is not a unique Subcognitive Question List and corresponding Human Subcognitive Profile; rather there are infinitely many such lists and profiles. But the key assumption is not that these subcognitive question lists will be able to sample subcognitive space exhaustively (an impossible task) but rather that they will sample it in a random, representative manner. Thus, even though I(H) and I(M) will certainly not be using identical subcognitive question lists, discriminations based on their respective question lists will very likely be the same, insofar as each list samples subcognitive space representatively. As a result, any entity (machine or human from, say, a radical culture) that shows a significant statistical departure from one profile will show a similar departure from the other. In short, whenever I(H) detects a significant difference from his subcognitve profile (thereby presumably judging the deviant entity to be a machine), so will I(M).
14. In conclusion, the inverted Turing Test could be passed by an obviously unintelligent program armed only with a rather lengthy but completely canned list of subcognitive questions, the corresponding human response profile to those questions, and a means of doing statistical comparisons. Consequently, the idea of an inverted Turing test must, at least in its present form, be rejected.
French, R.M., (1995) Refocusing the Debate on the Turing Test. Behavior and Philosophy. 23(1), 61-62.
French, R.M. (1990) Subcognition and the Limits of the Turing Test. Mind, 99(393), 53-65.
Turing, A. (1950) Computing machinery and intelligence. Mind. 59(236), 433-60.
Turing, A. (1936) On computable numbers with an application to the Entscheidungs Problem. Proceedings of the London Mathematical Society, 42, 230-265.
Watt, S. (1996) Naive-Psychology and the Inverted Turing Test. PSYCOLOQUY 7(14) turing-test.1.watt.