Seems like it has potential for abuse, though, aka forcibly driving traffic. It can be defeat-able though, it’s just meant to deter lazy human impulses.
I’ll make a complementary argument below in a sec, but “enforcing driving traffic” seems like a feature, not a bug.
For how testy people get about crawling for copyrigted stuff for things like AI, everybody seems super chill about search engines and aggregators ripping off content at industrial scales with zero repercussions.
Tbh, I’d be less testy about bots scraping my sites for AI input IF they respected my robots.txt file and didn’t slam the server. They’re just rude and I don’t like it. Sometimes they’re so rude it’s effectively a DOS attack.
Tbh, my sites exist to get information out there and I don’t care if someone mirrors my sites, as long as the information is still accurate.
I mean, that’s great and you’re well within your rights, but that’s not what people generally say when they express outrage about AI scraping. People straight up call it theft very often and seem to consider using online content for training is the equivalent of copying or distributing it.
Which stands out to me because that was not what happened when the EU decided that Google News was effectively piracy after a whole bunch of news outlets complained. The consensus there seemed to be that it was a bummer to lose the service despite all the scraping.
Oh yeah, I get that there’s more than 2 reasons to be upset about AI scraping. I work in the academic library world and the vibe here is
bots are rude
AI is not a reliable source of facts
We work with facts and information, and I have no expectation that my collection of facts is something to defend against replication.
On the other hand, I’d be pissed AF if someone stole my research paper on 1800s family drama and reprinted it without attribution, or AI-hallucinated new pseudo-facts that were not in the source materials.
I would get behind “click through before vote”
Seems like it has potential for abuse, though, aka forcibly driving traffic. It can be defeat-able though, it’s just meant to deter lazy human impulses.
I’ll make a complementary argument below in a sec, but “enforcing driving traffic” seems like a feature, not a bug.
For how testy people get about crawling for copyrigted stuff for things like AI, everybody seems super chill about search engines and aggregators ripping off content at industrial scales with zero repercussions.
Tbh, I’d be less testy about bots scraping my sites for AI input IF they respected my robots.txt file and didn’t slam the server. They’re just rude and I don’t like it. Sometimes they’re so rude it’s effectively a DOS attack.
Tbh, my sites exist to get information out there and I don’t care if someone mirrors my sites, as long as the information is still accurate.
I mean, that’s great and you’re well within your rights, but that’s not what people generally say when they express outrage about AI scraping. People straight up call it theft very often and seem to consider using online content for training is the equivalent of copying or distributing it.
Which stands out to me because that was not what happened when the EU decided that Google News was effectively piracy after a whole bunch of news outlets complained. The consensus there seemed to be that it was a bummer to lose the service despite all the scraping.
Oh yeah, I get that there’s more than 2 reasons to be upset about AI scraping. I work in the academic library world and the vibe here is
We work with facts and information, and I have no expectation that my collection of facts is something to defend against replication.
On the other hand, I’d be pissed AF if someone stole my research paper on 1800s family drama and reprinted it without attribution, or AI-hallucinated new pseudo-facts that were not in the source materials.