Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

  • Snot Flickerman@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    1
    ·
    4 hours ago

    Part of a properly functioning LLM is absolutely it understanding implicit instructions. That’s a huge aspect of data annotation work in helping LLMs become better tools, is grading them on either understanding or lack of understanding of implicit instructions. I would say more than half of the work I have done in that arena has focused on training them to more clearly understand implicit instructions.

    So sure, if you explain it like the LLM is five, you’ll get a better response, but the whole point is if we’re dumping so much money and resources and destroying the environment for these tools, you shouldn’t have to explain it like it’s five.