The Turing Test: What Is It, What Can Pass It, and Limitations

What Is the Turing Test?

The Turing Test is a deceptively simple method of determining whether a machine can demonstrate human intelligence: If a machine can engage in a conversation with a human without being detected as a machine, it has demonstrated human intelligence.

The Turing Test was proposed in a paper published in 1950 by mathematician and computing pioneer Alan Turing. It has become a fundamental motivator in the theory and development of artificial Intelligence (AI).

Key Takeaways

The Turing Test measures the intelligence of a test subject to determine whether a machine can demonstrate intelligence.
According to the test, a computer program can think if its responses can fool a human into believing it, too, is human.
Not everyone accepts the validity of the Turing Test, but passing it remains a major challenge to developers of artificial intelligence.
There are variations to the Turing test as well as modifications to the approach of asking questions in different AI tests.
The Turing test has several limitations including requiring a controlled environment, not having a dedicated definition of intelligence, and needing to adapt to evolving technological advancements.

Understanding the Turing Test

Rapid advances in computing are now visible in many aspects of our lives. We have programs that translate one language to another in the blink of an eye, robots that clean an entire home in minutes, finance robots that create personalized retirement portfolios, and wearable devices that track our health and fitness levels.

At the forefront of disruptive technology is the development of artificial intelligence and what limitations computer can experience. For this reason, the Turing test was designed to evaluate whether a computer could be "smart" enough to be mistaken for a human. Critics of the Turing Test argue that a computer can be built that has the ability to think, but not to have a mind of its own. They believe that the complexity of the human thought process cannot be coded.

The test is conducted in an interrogation room run by a judge. The test subjects, a person and a computer program, are hidden from view. The judge has a conversation with both parties and attempts to identify which is the human and which is the computer, based on the quality of their conversation. Turing concludes that if the judge can't tell the difference, the computer has succeeded in demonstrating human intelligence. That is, it can think.

History of the Turing Test

Alan Turing developed some of the basic concepts of computer science while searching for a more efficient method of breaking coded German messages during World War II. After the war, he began thinking about artificial intelligence. In his 1950 paper, Turing began by posing the question, “Can machines think?” He then proposed a test that is meant to help humans answer the question.

Several early computers hold early claims to have the ability to have fooled humans in very basic situations. In 1966, Joseph Weizenbaum created ELIZA, a machine that took specific words and transformed the words into full sentences. ELIZA was one of the earliest computers to have fooled human tester into thinking it was human.

Less than a decade later, a chatbot named PARRY was modeled to imitate the behavior of a paranoid schizophrenic. A group of psychiatrists were asked to analyze conversations with real patients and PARRY conversations. When asked to identify which transcripts were computer programs, the group was only able to identify the machine 48% of the time. Critics of both ELIZA and PARRY state the the full rules of the Turing test were not met and do not indicate full machine intelligence.

A chatbot named Eugene Goostman is accepted by some as the first to pass the Turing Test, in 2014.

The Turing Test Today

The Turing Test has its detractors, but it remains a measure of the success of artificial intelligence projects. An updated version of the Turing Test has more than one human judge interrogating and chatting with both subjects. The project is considered a success if more than 30% of the judges, after five minutes of conversation, conclude that the computer is a human.

The Loebner Prize is an annual Turing Test competition that was launched in 1991 by Hugh Loebner, an American inventor and activist. Loebner created additional rules requiring the human and the computer program to have 25-minute conversations with each of four judges. The winner is the computer whose program receives the most votes and the highest ranking from the judges.

In 2014, Kevin Warwick of the University of Reading organized a Turing Test competition to mark the 60th anniversary of Alan Turing’s death. A computer chatbot called Eugene Goostman, who had the persona of a 13-year-old boy, technically passing the Turing Test in that event. He secured the votes of 33% of the judges who were convinced that he was human.

In 2018, Google Duplex revealed the capability to performing tasks via the telephone. In various demonstrations, Duplex schedule a hair appointment as well as called a restaurant, with the human on the other end of the line not realizing they were interacting with a machine. However, critics point out that the interaction does not conform to the actual Turing test and claim the test has still yet to be beaten by a machine.

Turing Test Versions

There are several variations of Turing tests, all with the same intention of detecting whether a respondent is a human or a machine. Each variation takes a different approach in asking the respondent different questions and evaluating the responses.

Imitation Game

One of the earlier applications of the Turing test, the imitation game version often utilizes three parties. The first person was a male, the second person was a female, and the third person was responsible for determining the gender of the first two people. The first person is often tasked with trying to trick the third person, while the second person is often tasked with trying to help the third person correctly identify each gender.

Future iterations of the imitation game have evolved into both parties attempting to trick the third person into incorrectly identifying the genders. In any case, the objective of the imitation game is to determine whether an interrogator can be fooled.

Standard Interpretation

Another commonly version of the Turing test does not strive to see whether a computer can be fooled but rather to see whether a computer can imitate a human. In the standard interpretation variation of a Turing test, the first person is a computer and the second person is a human of either sex.

In this variation, the third person attempts to discover which of the first two people is a human and which is a computer. The interrogator is not the subject being tested; instead, it is the computer that is trying to fool the human (as opposed to the opposite direction under the imitation game). For example, it may be asked a series of personal finance questions to determine whether or not its responses are reasonably expected regarding behavioral finance.

The fictitious Voight-Kampff in the science fiction dystopian series Blade Runner is a play on the idea of testing a machine for its intelligence behavior.

Modern Approaches to the Turing Test

Since the creation of the Turing test, more modern approaches have evolved in an attempt to better detect humans and machines. These variations of the Turing test are continually evolving to maintain relevance during technological advancements.

The Reverse Turing Test aims to have a human trick a computer into having the computer believe it is not interrogating a human.
The Total Turing Test incorporates perceptual abilities and the person being question's ability to manipulate objects.
The Marcus Test has test subjects view media and respond to questions about the content consumed.
The Lovelace Test 2.0 has test subjects create art and examines their ability to do so.
The Minimum Intelligent Signal test asks test subjects only binary questions (i.e. only true/false or yes/no answers are allowed).

Limitations of the Turing Test

There are many critics of the Turing test, and the variations above attempt to mitigate some of the limitations of the original Turing test. Still, it is important to be mindful of the downsides of the Turing test and where its analysis may fall short.

The Turing test requires a very controlled environment to be performed. Test participants must be hidden from view of each other during the entirety of the test, though the parties must have a reliable means of communication.
The Turing test may not be suitable to test for intelligence as different computing systems are structured differently. Therefore, there may be inherent, natural limits to what a computer is capable of performing.
The Turing test is evolving; however, technological advancements are evolving even faster. Consider Moore's Law which states the rapid growth of processing ability with the rapid decline in cost. As computer gain more capabilities, historical testing methods may no longer be suitable as computers gain more human-like capabilities.
The Turing test assesses intelligence, though it may not be an appropriate gauge of all types of intelligence. For example, a computer may successfully fool an interrogator based on its ability to process responses similarly as a human. However, this may not truly indicate emotional intelligence or awareness; it may simply mean the computer had a highly relevant and competent set of code.

How Does a Turing Test Work?

A Turing test works with an interrogator asking a test subject a series of questions. Each party is kept in a separate area, so no physical contact is allowed. The responses given by the test subject are evaluated based on whether answers can discriminate between whether a human subject would give the response or not.

Has Any Machine Passed the Turing Test?

In 2018, Google Duplex was introduced at the annual Google I/O Annual Developer Conference. The machine scheduled a hair salon appointment and interacted with a hair salon assistant via the phone as part of the conversation. Though some critics view the outcome differently, some believe Google Duplex passed the Turing test.

Can a Human Fail the Turing Test?

Yes. Although a Turing test is based on knowledge and intelligence, it is also about evaluating how responses are given and whether the answers are interpreted to be sneaky.

For example, imagine being asked to provide the sum of 43,219 and 87,878. Whether or not you can provide the correct answer is only part of the exam; the Turing test evaluates how long it takes you provide an answer, any clarifying questions you ask in response, or whether you comprehend to add and not subject the two figures. Based on any human's responses, it is possible to be mistaken for a computer (i.e. if you accidently subtracted instead of added the figures, that may be incriminating evidence).

What Are Examples of Turing Test Questions?

An interesting example of a potential Turing test question may be based around language and the play on words. For example, a question may ask "what is the different being time flying and an airplane flying?". Though this type of question may be unfair for participants not familiar with the English language, it is also an example of being able to make logical distinctions where a single instance (i.e. the word fly) may mean different things in different contexts.

Another example of a Turing test question is often nonsensical questions. Questions such as "Is the difference between football that the batter wears a helmet?" is grammatically incorrect and easily detectible by a human as not making any sense. However, a machine may still try to parse a response.

The Bottom Line

The Turing test is an assessment to determine whether a machine is able to exhibit the same intelligence as a human. There are now many variations of the Turing test, and as technology continues to advance with AI at the forefront, new lines of thinking are emerging with regard to means of determining intelligence and a lot of nuances are resulting from that thinking as well, which requires more work to be done in this area.