There is a lot more that goes into it than just being correct. 18000 waters may have been the actual order, because somebody decided to screw with the machine. A human who isn’t terminally autistic would reliably interpret that as a joke and would simply refuse to punch that in. The LLM will likely do what a human tells it to do, since it has no contextual awareness, it only has the system prompt and whatever interaction with the user it had so far.
They do, my concern is more about if that JSON is correct, not just well-formed.
Also, 18000 waters might be correct JSON, but makes an AI a bad cashier.
There is a lot more that goes into it than just being correct. 18000 waters may have been the actual order, because somebody decided to screw with the machine. A human who isn’t terminally autistic would reliably interpret that as a joke and would simply refuse to punch that in. The LLM will likely do what a human tells it to do, since it has no contextual awareness, it only has the system prompt and whatever interaction with the user it had so far.
Thats part of correctness to me, delivering an order that taco bell actually would make is important.
Semantics aside, though, we agree. That’s very important.
So they just trim the instructions so it doesn’t take joke orders, so it can make more reasonable decisions, like:
“May I take your order?”
“Two double whoppers with extra mayo and a chocolate cherry banana sundae”
“Oh you’ve GOTTA be joking!”
It’s trivial to get LLMs to act against the instructions