Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
I do think it’s interesting, but I think there are implicit assumptions in such a short prompt.
Is it a self-service car wash? If not, walking to the attendant and handing them your keys makes more sense.
If it is self-service without queuing, there may be no available spaces/the bay may not be open, requiring some awkward maneuvering.
If you change it to something like:
You’re more likely to get correct responses.
You have to have the car there no matter what type of car wash it is.
If the car wash is some distance “away”, it means neither you nor the car is at it. Any attendant is not going to walk off-property to retreive your car, especially when most of them you drive up for service. Which is rather the point.
You shouldnt have to. If you ask a person that question theyll respond “what good is walking to the car wash, dumbass,” if AI can’t figure that out its trash
A person would look at you like you are an idiot if you asked this question.
The AI tool I asked said walking saves money, gets excersise etc.
Asked about the car and it said the car is at the car wash, otherwise why would you ask how to get there?
Missing the point. Any person would know walking to the car wash isn’t reasonable. You shouldn’t have to craft a perfectly tailored prompt for AI to realize that. If you think this is a gatcha, then whoah boy, I’ve got a bridge to well ya!
You are missing the point. Any reasonable person would wonder why you asking a stupid question.
Which is why when asked, the AI said of course the car is there, you. Must be asking either a trick question or for another reason.
It could be that. or it could be that the AI gives the illusion of reasoning and this is an example of the illusion breaking. But no it was probably that it knew it was a trick question and decided to answer wrongly because it is very very smart. Yeah.
Part of a properly functioning LLM is absolutely it understanding implicit instructions. That’s a huge aspect of data annotation work in helping LLMs become better tools, is grading them on either understanding or lack of understanding of implicit instructions. I would say more than half of the work I have done in that arena has focused on training them to more clearly understand implicit instructions.
So sure, if you explain it like the LLM is five, you’ll get a better response, but the whole point is if we’re dumping so much money and resources and destroying the environment for these tools, you shouldn’t have to explain it like it’s five.