So taking data without permission is bad, now?

I’m not here to say whether the R1 model is the product of distillation. What I can say is that it’s a little rich for OpenAI to suddenly be so very publicly concerned about the sanctity of proprietary data.

The company is currently involved in several high-profile copyright infringement lawsuits, including one filed by The New York Times alleging that OpenAI and its partner Microsoft infringed its copyrights and that the companies provide the Times’ content to ChatGPT users “without The Times’s permission or authorization.” Other authors and artists have suits working their way through the legal system as well.

Collectively, the contributions from copyrighted sources are significant enough that OpenAI has said it would be “impossible” to build its large-language models without them. The implication being that copyrighted material had already been used to build these models long before these publisher deals were ever struck.

The filing argues, among other things, that AI model training isn’t copyright infringement because it “is in service of a non-exploitive purpose: to extract information from the works and put that information to use, thereby ‘expand[ing] [the works’] utility.’”

This kind of hypocrisy makes it difficult for me to muster much sympathy for an AI industry that has treated the swiping of other humans’ work as a completely legal and necessary sacrifice, a victimless crime that provides benefits that are so significant and self-evident that it’s wasn’t even worth having a conversation about it beforehand.

A last bit of irony in the Andreessen Horowitz comment: There’s some handwringing about the impact of a copyright infringement ruling on competition. Having to license copyrighted works at scale “would inure to the benefit of the largest tech companies—those with the deepest pockets and the greatest incentive to keep AI models closed off to competition.”

“A multi-billion-dollar company might be able to afford to license copyrighted training data, but smaller, more agile startups will be shut out of the development race entirely,” the comment continues. “The result will be far less competition, far less innovation, and very likely the loss of the United States’ position as the leader in global AI development.”

Some of the industry’s agita about DeepSeek is probably wrapped up in the last bit of that statement—that a Chinese company has apparently beaten an American company to the punch on something. Andreessen himself referred to DeepSeek’s model as a “Sputnik moment” for the AI business, implying that US companies need to catch up or risk being left behind. But regardless of geography, it feels an awful lot like OpenAI wants to benefit from unlimited access to others’ work while also restricting similar access to its own work.

  • maplebar@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 hours ago

    Even in GPL and CC-BY-SA context you still retain copyright ownership over your work. I write GPL code for a modest living, and my real name copyright goes on everything I write. Likewise, your still asking to be credited in your CC-BY-SA music. Nothing wrong with that.

    The point being is that we are making a conscious decision to license the things we create in a permissive way. Neither of us are anonymously dumping our work into the public domain because clearly we do care about ownership and copyright. That’s well within our rights as creators.

    Generative AI is exploiting our work and not even doing the bare minimum of following the licenses that we shared them under.

    • dreadbeef@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      2 hours ago

      I only care about copyright while it exists. If copyright is abolished and there’s no such thing as intellectual property then I’m happy with that as well.

      No one should have the right of using police with guns to maintain the ownership of ideas and our culture. I don’t want this power. I don’t want you to have this power over my neighbors. I don’t want the Disney corporation having this power over my friends and family. But if capitalists wield that power against the common man, then I’m not against the common man wielding it against the capitalist.

      I’m okay with using copyright against only those that have more power than me. I don’t ever want to threaten a normal human being with it, only capitalists.

      I’m not against all cases of generative ai, code or visual. I’ve had legit use cases for them, and in a post scarce world we wouldn’t care about these things being made in a completely floss way. If copyright were to last only 15 years after publication then I think the world would be much better and we wouldn’t be having this conversation. But I won’t argue ai stuff as it currently stands—it is a grift or a stockpiling.