Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 5 days ago

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

melfie@lemy.lol · 3 days ago

Context engineering is one way to shift that balance. When you provide a model with structured examples, domain patterns, and relevant context at inference time, you give it information that can help override generic heuristics with task-specific reasoning.

So the chat bots getting it right consistently probably have it in their system prompt temporarily until they can be retrained with it incorporated into the training data. 😆

Schadrach@lemmy.sdf.org · 3 days ago

There are models with open weights, and you can run those locally on your GPU. It can be a bit slower depending on model and GPU. For example, GLM has an open version, both full and pruned, but it’s not the newest version. A bunch of image generation models have local versions too.

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Opper