Meta admits using pirated books to train AI, but won't pay for it

Lee Duna@lemmy.nz · 8 months ago

Meta admits using pirated books to train AI, but won't pay for it

TWeaK@lemm.ee · 8 months ago

Fair use covers research, but creating a training database for your commercial product is distinctly different from research. They’re not publishing scientific papers, along with their data, which others can verify; they are developing a commercial product for profit. Even compared to traditional R&D this is markedly different, as they aren’t building a prototype - the test version will eventually become the finished product.

The way fair use works is that a judge first decides whether it fits into one of the categories - news, education, research, criticism, or comment. This does not really fit into the category of “research”, because it isn’t research, it’s the final product in an interim stage. However, even if it were considered research, the next step in fair use is the nature, in particular whether it is commercial. AI is highly commercial.

AI should not even be classified in a fair use category, but even if it were, it should not be granted any exemption because of how commercial it is.

They use other peoples’ work to profit. They should pay for it.

Facebook steals the data of individuals. They should pay for that, too. We don’t exchange our data for access to their website (or for access to some 3rd party Facebook pays to put a pixel on), the website is provided free of charge, and they try and shoehorn another transaction into the fine print of the terms and conditions where the user gives up their data free of charge. It is not proportionate, and the user’s data is taken without proper consideration (ie payment, in terms of the core principles of contract law).

Frankly, it is unsurprising that an entity like Facebook, which so egregiously breaks the law and abuses the rights of every human being who uses the interent, would try to abuse content creators in such a fashion. Their abuse needs to be stopped, in all forms, and they should be made to pay for all of it.

Syntha@sh.itjust.works · 8 months ago

They’re not publishing scientific papers, along with their data, which others can verify;

Not that I think this is really relevant here but I’m pretty sure Meta has published scientific papers on Llama and the Llama 1 & 2 models are open and accessible to anyone.

TWeaK@lemm.ee · 8 months ago

No that is relevant, however I would still argue that a paper without enough data to replicate their work (ie releasing the code of their LLM) isn’t really anything that should qualify as research. The whole point of academia is that someone else verifies your work - or rather, they try to prove you wrong.

tinwhiskers@lemmy.world · edit-2 8 months ago

They have released it on github. The code is only about 500 lines. But releasing the model is arguably more important because that sort of compute is not affordable to any mortals.

TWeaK@lemm.ee · 8 months ago

Yeah I mean what they’ve released is essentially the design of the battery and starter system, without the design of the actual motor. You can’t replicate their product and prove their work with what they’ve published.