KR Conference 2012 Conference Paper
- Hector Levesque
- Ernest Davis
- Leora Morgenstern
ing the presence of thinking (or understanding, or intelligence, or whatever appropriate mental attribute), we assume that typed English text, despite its limitations, will be a rich enough medium. In this paper, we present an alternative to the Turing Test that has some conceptual and practical advantages. A Winograd schema is a pair of sentences that differ only in one or two words and that contain a referential ambiguity that is resolved in opposite directions in the two sentences. We have compiled a collection of Winograd schemas, designed so that the correct answer is obvious to the human reader, but cannot easily be found using selectional restrictions or statistical techniques over text corpora. A contestant in the Winograd Schema Challenge is presented with a collection of one sentence from each pair, and required to achieve human-level accuracy in choosing the correct disambiguation. 1 2 The trouble with Turing The Turing Test does have some troubling aspects, however. First, note the central role of deception. Consider the case of a future intelligent machine trying to pass the test. It must converse with an interrogator and not just show its stuff, but fool her into thinking she is dealing with a person. This is just a game, of course, so it’s not really lying. But to imitate a person well without being evasive, the machine will need to assume a false identity (to answer “How tall are you? ” or “Tell me about your parents. ”). All other things being equal, we should much prefer a test that did not depend on chicanery of this sort. Or to put it differently, a machine should be able to show us that it is thinking without having to pretend to be somebody or to have some property (like being tall) that it does not have. We might also question whether a conversation in English is the right sort of test. Free-form conversations are no doubt the best way to get to know someone, to find out what they think about something, and therefore that they are thinking about something. But conversations are so adaptable and can be so wide-ranging that they facilitate deception and trickery. Consider, for example, ELIZA (Weizenbaum 1966), where a program (usually included as part of the normal Emacs distribution), using very simple means, was able to fool some people into believing they were conversing with a psychiatrist. The deception works at least in part because we are extremely forgiving in terms of what we will accept as legitimate conversation. A Rogerian psychiatrist may say very little except to encourage a patient to keep on talking, but it may be enough, at least for a while. Consider also the Loebner competition (Shieber 1994), a restricted version of the Turing Test that has attracted considerable publicity. In this case, we have a more balanced conversation taking place than with ELIZA. What is striking about transcripts of these conversations is the fluidity of the responses from the subjects: elaborate wordplay, puns, jokes, quotations, clever asides, emotional outbursts, points of order. Everything, it would seem, except clear and direct