If AI is making the Turing test obsolete, what might be better?

Greenpepper@beehaw.org · 11 months ago

If AI is making the Turing test obsolete, what might be better?

tal@lemmy.today · edit-2 11 months ago

The Turing Test isn’t really intended to identify a computer – Turing’s problem wasn’t that we needed a way to identify computers.

At the time – well, and to some extent today – some people firmly felt that a computer could not actually think, that that is something “special” that only humans can do.

It’s intended to support Turing’s argument for a behavioral approach to thinking – that if a computer can behave indistinguishably from a human that we agree thinks, then that should be the bar for what we talk about when talking about thinking.

There have been people since who have aimed to actually work towards such chatbot, but for Turing, this was just a hypothetical to support his argument.

https://en.wikipedia.org/wiki/Turing_test

The test was introduced by Turing in his 1950 paper “Computing Machinery and Intelligence” while working at the University of Manchester.[5] It opens with the words: “I propose to consider the question, ‘Can machines think?’” Because “thinking” is difficult to define, Turing chooses to “replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.”[6]

Turing did not intend for his idea to be used to test the intelligence of programs—he wanted to provide a clear and understandable example to aid in the discussion of the philosophy of artificial intelligence.[82] John McCarthy argues that we should not be surprised that a philosophical idea turns out to be useless for practical applications. He observes that the philosophy of AI is “unlikely to have any more effect on the practice of AI research than philosophy of science generally has on the practice of science.”[83][84]

Cass.Forest@beehaw.org · 11 months ago

There is, however, still the concept of the Chinese Room thought experiment, and I don’t think AI will topple that one for a while.

For those who don’t know and don’t wish to browse off the site, the thought experiment posits a situation in which a guy who does not understand Chinese is sat in a room and told to respond to sets of Chinese characters that come into the room. He has a little booklet of responses—all completely in Chinese—for him to use to send responses out of the room. The thought experiment questions whether or not the system of the Chinese Room itself can be thought to understand Chinese or even the man himself.

With the Turing Test getting all of the media spotlight in AI, machine learning, and cognitive science, I think the Chinese Room should enter into the conversation as the field of AI looks towards G.A.I.

jarfil@beehaw.org · 11 months ago

The Chinese Room has already been surpassed by LLMs, which have shown to contain neurons that activate in such high correlation to abstract concepts like “formal text” or “positive sentiment”, that tweaking them is one of the options that LLM based chatbots are presenting to the user.

Analyzing the activation space, it’s also been shown that LLMs categorize and cluster sequences of text representing similar concepts closer to each other, which allows them to present reasonably accurate zero shot responses that have never been in the training set (that “weren’t in the book” for the Chinese Room).

howrar@lemmy.ca · 11 months ago

I don’t understand what you mean by “The Chinese Room has already been surpassed by LLMs”. It’s not a test that can be surpassed. It’s just a thought experiment.

In any case, you do bring up a good point. Perhaps this understanding is in the organization of the information. So if you have a Chinese room where all the query-response pairs are in arbitrary orders, then maybe you wouldn’t consider that to be understanding. But if you have the data organized such that similar queries/responses are close to each other and this person in the room doing the answering can make mistakes such as accidentally copying out the response next to the correct response and still make sense, then maybe we can consider this system to have better understanding.

jarfil@beehaw.org · 11 months ago

The Chinese Room is really a thought experiment about the inner workings of a partner in a Turing test. Externally they have the same pitfalls, but the Chinese Room also reveals itself completely if one can observe in detail the inner workings of the room/partner.

LLMs are still mostly black boxes, but we can have enough of a glimpse inside to reveal that they aren’t “following some rails” like a simple algorithm.

make mistakes such as accidentally copying out the response next to the correct response and still make sense

Precisely. This is another part that we can see with LLMs: at runtime, the models get applied a “temperature” parameter, which intentionally introduces a certain level of mistakes. With “temperature = 0”, the output is a “stochastic parrot”, and quickly turns into nonsense. With a higher temperature, the randomness increases and the output becomes a total mess. But setting it just right, to a sweet spot of “very little, but not zero”, turns out to produce the outputs that we see in ChatGPT and similar.

Knowing that the concept space of LLMs has similar concepts clustered, it makes sense that these errors would force the LLM to sometimes make associations on the fly between close concepts, associations that it didn’t have trained for before, and which “derail” it into a close, but not exactly the same, train of thought.

This behavior also seems to be what we call “intelligence” in humans: the ability to solve problems not seen before (zero shot).

A further extension would be the ability to constantly learn from every interaction. Right now LLMs have a “context” of some length, that changes dynamically, but has no influence over the pre-trained network.

Interestingly, this has a parallel in “crystallized intelligence” vs. “fluid intelligence” in humans.

So… maybe LLMs are not full AGIs yet, but they are showing many of the behaviors that we would expect from an AGI, while at the same time giving or confirming insights into the workings of the human mind itself.

jonsnothere@beehaw.org · 11 months ago

The problem with the Turing test and current AI is that we didn’t teach computers to think, we taught them to talk.