Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • nomnomdeplume@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    8
    ·
    4 hours ago

    I do think it’s interesting, but I think there are implicit assumptions in such a short prompt.

    Is it a self-service car wash? If not, walking to the attendant and handing them your keys makes more sense.

    If it is self-service without queuing, there may be no available spaces/the bay may not be open, requiring some awkward maneuvering.

    If you change it to something like:

    I want to wash my car. The unattended, self-service car wash is 50 meters away. All of the bays are clear and open. Should I walk or drive? Break each option down into steps, and estimate the amount of time each takes.

    You’re more likely to get correct responses.

    • 🌞 Alexander Daychilde 🌞@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 hours ago

      You have to have the car there no matter what type of car wash it is.

      If the car wash is some distance “away”, it means neither you nor the car is at it. Any attendant is not going to walk off-property to retreive your car, especially when most of them you drive up for service. Which is rather the point.

    • ToTheGraveMyLove@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      3 hours ago

      You shouldnt have to. If you ask a person that question theyll respond “what good is walking to the car wash, dumbass,” if AI can’t figure that out its trash

      • NewNewAugustEast@lemmy.zip
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        4
        ·
        3 hours ago

        A person would look at you like you are an idiot if you asked this question.

        The AI tool I asked said walking saves money, gets excersise etc.

        Asked about the car and it said the car is at the car wash, otherwise why would you ask how to get there?

        • ToTheGraveMyLove@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          3 hours ago

          Missing the point. Any person would know walking to the car wash isn’t reasonable. You shouldn’t have to craft a perfectly tailored prompt for AI to realize that. If you think this is a gatcha, then whoah boy, I’ve got a bridge to well ya!

          • NewNewAugustEast@lemmy.zip
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            2
            ·
            2 hours ago

            You are missing the point. Any reasonable person would wonder why you asking a stupid question.

            Which is why when asked, the AI said of course the car is there, you. Must be asking either a trick question or for another reason.

            • rebelsimile@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              13 minutes ago

              It could be that. or it could be that the AI gives the illusion of reasoning and this is an example of the illusion breaking. But no it was probably that it knew it was a trick question and decided to answer wrongly because it is very very smart. Yeah.

    • Snot Flickerman@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      4 hours ago

      Part of a properly functioning LLM is absolutely it understanding implicit instructions. That’s a huge aspect of data annotation work in helping LLMs become better tools, is grading them on either understanding or lack of understanding of implicit instructions. I would say more than half of the work I have done in that arena has focused on training them to more clearly understand implicit instructions.

      So sure, if you explain it like the LLM is five, you’ll get a better response, but the whole point is if we’re dumping so much money and resources and destroying the environment for these tools, you shouldn’t have to explain it like it’s five.