• nightlily@leminal.space
    link
    fedilink
    English
    arrow-up
    15
    ·
    12 hours ago

    That’s part of the reason these models haven’t improved much in the last year or so. They‘ve absorbed all the public facing internet and whatever copyrighted works they could get away with pirating (pretty much all printed work), and now they are faced with a brick wall. They haven’t come up with a way to create new content, to reinforce a „correct“ statistical model without causing model collapse, and I don’t think they ever will. The well (the public internet) is already thoroughly poisoned so they have to use a snapshot of the pre-LLM internet, not even an up to date one.

    If it isn’t good enough after consuming almost the entirety of humanity’s written output since the invention of the printing press, it’s never going to be.