Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
Torture can be a useful way of extracting information if you have a way to instantly verify it, which actually makes it a good analogy to LLMs. If I want to know the password to your laptop and torture you until you give me the correct password and I log in then that works.
If you can instantly verify it then you don’t need the torture.
In fact it cannot ever be a useful way of extracting information. Even just randomly guessing is a better way to get the information you want than torture.
I’m not saying its anything other than morally repugnant, obviously, but in the example of a password with billions or trillions of combinations and where you can check the answers given torture pretty obviously is better than guessing.
That’s not a scenario that is ever likely to come up, and wouldn’t be justifiable even if it did, but pretending it wouldnt be effective is ridiculous.