• TheLeadenSea@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    53
    ·
    1 day ago

    They have RLHF (reinforcement learning from human feedback) so any negative, biased, or rude responses would have been filtered out in training. That’s the idea anyway, obviously no system is perfect.

      • SkyNTP@lemmy.ml
        link
        fedilink
        arrow-up
        22
        ·
        edit-2
        1 day ago

        That’s what was said. LLMs have been reinforced to respond exactly how they do. In other words, that “smarmy asshole” attitude, you describe was a deliberate choice. Why? Maybe that’s what the creators wanted, or maybe that’s what focus groups liked most.