Pay-per-output? AI firms blindsided by beefed up robots.txt instructions.

ccunning@lemmy.world · 8 hours ago

Pay-per-output? AI firms blindsided by beefed up robots.txt instructions.

Kissaki@feddit.org · 3 hours ago

evolves robots.txt instructions by adding an automated licensing layer that’s designed to block bots that don’t fairly compensate creators for content

robots.txt - the well known technology to block bad-intention bots /s

What’s automated about the licensing layer? At some point, I started skimming the article. They didn’t seem clear about it. The AI can “automatically” parse it?

# NOTICE: all crawlers and bots are strictly prohibited from using this 
# content for AI training without complying with the terms of the RSL 
# Collective AI royalty license. Any use of this content for AI training 
# without a license is a violation of our intellectual property rights.

License: https://rslcollective.org/royalty.xml

Yeah, this is as useless as I thought it would be. Nothing here is actively blocking.

I love that the XML then points to a text/html content website. I guess nothing for machine parsing, maybe for AI parsing.

I don’t remember which AI company, but they argued they’re not crawlers but agents acting on the users behalf for their specific request/action, ignoring robots.txt. Who knows how they will react. But their incentives and history is ignoring robots.txt.

Why ~~am I~~ is this comment so negative. Oh well.

billwashere@lemmy.world · 4 hours ago

The issue is the line that says “compensate creators”. Reddit still thinks it’s the creator, not the individual users.

FaceDeer@fedia.io · 6 hours ago

And suddenly the Internet is gung-ho in favor of EULAs being enforceable simply by reading the content the website has already provided.

Recent major court cases have held that the training of an AI model is fair use and doesn’t involve copyright violation, so I don’t think licensing actually matters in this case. They’d have to put the content behind a paywall to stop the trainer from seeing it in the first place.

ccunning@lemmy.world · 6 hours ago

I guess that’s a different court case than the one where Anthropic offered to pay $1.5 billion?

FaceDeer@fedia.io · 2 hours ago

Nope, this was one of them. The case had two parts, one about the training and one about the downloading of pirated books. The judge issued a preliminary judgment about the training part, that was declared fair use without any further need to address it in trial. The downloading was what was proceeding to trial and what the settlement offer was about.

NewNewAugustEast@lemmy.zip · 5 hours ago

Totally different. Anthropic could have bought all the books and trained on them. Pirating is a different topic.

corsicanguppy@lemmy.ca · 4 hours ago

Anthropic could have bought

You think buying the books would let them plagiarize ? That doesn’t seem to be normal in the “book buying” process.

NewNewAugustEast@lemmy.zip · edit-2 3 hours ago

Doesn’t really matter what I think, its a different concept than pirating. Hence a different thing than what was getting ruled on.

I mean AI or not look at it this way: if a company wanted to train their workers and pirated all the training manuals, piracy is the issue, not the training.

tchambers@crust.piefed.social · 5 hours ago

Wonder if this could work for Fediverse servers too.

underline960@sh.itjust.works · 7 hours ago

Leeds told Ars that the RSL standard doesn’t just benefit publishers, though. It also solves a problem for AI companies, which have complained in litigation over AI scraping that there is no effective way to license content across the web.

"If they’re using it, they pay for it, and if they’re not using it, they don’t pay for it.

…

But AI companies know that they need a constant stream of fresh content to keep their tools relevant and to continually innovate, Leeds suggested. In that way, the RSL standard “supports what supports them,” Leeds said, “and it creates the appropriate incentive system” to create sustainable royalty streams for creators and ensure that human creativity doesn’t wane as AI evolves.

This article tries to slip in the idea that creators will benefit from this arrangement. Just like with Spotify and Getty Images, it’s the publisher that’s getting paid.

Then they decide how much they’ll let trickle down to creators.

I Cast Fist@programming.dev · 6 hours ago

Cue an even greater influx of AI slop pages in hopes of getting crawled for that juicy trickled down money

ccunning@lemmy.world · 7 hours ago

I would assume creators and published would agree to those terms in advance (moving forward of course).

zrst@lemmy.cif.su · 6 hours ago

Does AI cost advertisers money?

I’d be cool with it if that’s the case.

GissaMittJobb@lemmy.ml · 7 hours ago

I have no idea what they think this will accomplish, to be honest. It has the legal value of posting on Facebook that you don’t allow them to use your photos.

ccunning@lemmy.world · 7 hours ago

I think the idea is that all parties would find it beneficial:

Leeds told Ars that the RSL standard doesn’t just benefit publishers, though. It also solves a problem for AI companies, which have complained in litigation over AI scraping that there is no effective way to license content across the web.

ricecake@sh.itjust.works · 7 hours ago

The thing is a robots.txt file doesn’t work as licensing. There’s no legal requirement to fetch the file, and no mechanism to consent or track consent.

This is putting up a sign that says everyone must pay, and then giving it to anyone who asks for free.

ccunning@lemmy.world · 6 hours ago

The thing is if all parties find the terms agreeable it doesn’t matter if it’s legally binding.

It’s more like putting a price on the shelf at the grocery store. Not every one will agree the price is agreeable and you might still get shoplifters but it doesn’t mean it’s a waste of time to list the price.

ricecake@sh.itjust.works · 5 hours ago

It really does matter if it’s legally binding if you’re talking about content licensing. That’s the whole thing with a licensing agreement: it’s a legal agreement.

The store analogy isn’t quite right. Leaving a store with something you haven’t purchased with the consent of the store is explicitly illegal.
With a website, it’s more like if the “shoplifter” walked in, didn’t request a price sheet, picked up what they wanted and went to the cashier who explicitly gave it to them without payment.

The crux of the issue is that the website is still providing the information even if the requester never agreed or was even presented with the terms.
If your site wants to make access to something conditional then it needs to actually enforce that restriction.

It’s why the current AI training situation is unlikely to be resolved without laws to address it explicitly.

Telorand@reddthat.com · 5 hours ago

I think the analogy is apt. If you post a price on goods, and somebody walks into a store, picks up the item, and walks out without paying, they can’t simply say, “Well, I didn’t care to read the price, and nobody presented me with a contract, so I just took it,” as a valid defense. There’s sometimes an explicit agreement upon terms, sure, but there are times where that agreement is implicit: they put a price on a thing, I pay it, else it’s stealing. I don’t need to sign a contract every time I get groceries.

I do, however, agree that this will only have teeth once it’s argued and upheld in court the first (few) time(s). If nothing else, it’s good to see people trying to solve the problem, rather than just throwing up their hands and letting billionaires run amok with virtual impunity. Maybe this won’t work to reign in AI tech bros, but maybe it will inspire the things that do.

ricecake@sh.itjust.works · 3 hours ago

Except that with the website example it’s not that they’re ignoring the price or just walking out with the item. It’s that the item was not labeled with a price, nor were they informed of the price. Then, rather than just walking out, they requested the item and it was delivered to them with no attempt to collect payment.

The key part of a website is that the user cannot take something. The site has to give it to them.
A more apt retail analogy might be you go to a website. You see a scooter you like, so you click “I want it!”. The site then asks for your address and a few days later you get a scooter in the mail.
That’s not theft, it’s a free scooter. If the site accused you of theft because you didn’t navigate to an unlinked page they didn’t tell you about to find the prices, or try to figure out payment before requesting, you’d rightly be pretty miffed.

The shoplifting analogy doesn’t work because it’s not shoplifting if the vendor gives it to you knowingly and you never misrepresented the cost or tried to avoid paying. Additionally, taking someone’s property without their permission is explicitly illegal, and we have a subcategory that explicitly spells out how retail fraud works and is illegal.

Under our current system the way to prevent someone from having your thing without paying or meeting some other criteria first is to collect payment or check that criteria before giving it to them.

To allow people to have things on their website freely available to humans but to prevent grabbing and using it for training will require a new law of some sort.

trailee@sh.itjust.works · edit-2 7 hours ago

Neither the article nor the RSL website makes clear how pricing or payment works, which seems like a huge miss. It’s not obvious if a publisher can price-differentiate among content, or even choose their own prices at all.

RSL makes an analogy:

Collective licensing organizations like ASCAP and BMI have long helped musicians get paid fairly by working together and pooling rights into a single, indispensable offering.

I’d like to get excited about this because AI companies suck, but if the best example they have is that ASCAP helps “musicians get paid fairly” I’m afraid this isn’t a solution that most content creators will celebrate.

BrianTheeBiscuiteer@lemmy.world · 7 hours ago

Not a bad idea but the biggest challenge will probably be determining who needs to be sued for non-compliance. Google might not be hiding the origin of its bots now but that could easily change.

tchambers@crust.piefed.social · 6 hours ago

Interesting.