hylobates@jlai.lu to Selfhosted@lemmy.worldEnglish · 1 day agoBased on this graph, and this graph alone, guess at what time I completely blocked OpenAI crawlersjlai.luimagemessage-square57fedilinkarrow-up1444arrow-down17file-text
arrow-up1437arrow-down1imageBased on this graph, and this graph alone, guess at what time I completely blocked OpenAI crawlersjlai.luhylobates@jlai.lu to Selfhosted@lemmy.worldEnglish · 1 day agomessage-square57fedilinkfile-text
minus-squarepunrca@piefed.worldlinkfedilinkEnglisharrow-up19arrow-down2·17 hours agoIt’s best to use either Cloudflare (best IMO) or Anubis. If you don’t want any AI bots, then you can setup Anubis (open source; requires JavaScript to be enabled by the end user): https://github.com/TecharoHQ/anubis Cloudflare automatically setups robots.txt file to block “AI crawlers” (but you can setup to allow “AI search” for better SEO). Eg: https://blog.cloudflare.com/control-content-use-for-ai-training/#putting-up-a-guardrail-with-cloudflares-managed-robots-txt Cloudflare also has an option of “AI labyrinth” to serve maze of fake data to AI bots who don’t respect robots.txt file.
minus-squareshane@feddit.nllinkfedilinkEnglisharrow-up13arrow-down1·10 hours agoIf you’re relying on Cloudflare are you even self-hosting?
minus-squareCyberSeeker@discuss.tchncs.delinkfedilinkEnglisharrow-up2arrow-down2·edit-27 hours agoIf you build a house, but hire a guard for the front gate, do you even own the house?!
minus-squareAHemlocksLie@lemmy.ziplinkfedilinkEnglisharrow-up8·12 hours agoPretty sure I’ve repeatedly heard about the crawlers completely ignoring robots.txt, so does Cloudflare really do that much?
minus-squareSv443@sh.itjust.workslinkfedilinkEnglisharrow-up3·9 hours agoLike a lock on a door, it stops the vast majority but can’t do shit about the actual professional bad guys
It’s best to use either Cloudflare (best IMO) or Anubis.
If you don’t want any AI bots, then you can setup Anubis (open source; requires JavaScript to be enabled by the end user): https://github.com/TecharoHQ/anubis
Cloudflare automatically setups robots.txt file to block “AI crawlers” (but you can setup to allow “AI search” for better SEO). Eg: https://blog.cloudflare.com/control-content-use-for-ai-training/#putting-up-a-guardrail-with-cloudflares-managed-robots-txt
Cloudflare also has an option of “AI labyrinth” to serve maze of fake data to AI bots who don’t respect robots.txt file.
If you’re relying on Cloudflare are you even self-hosting?
If you build a house, but hire a guard for the front gate, do you even own the house?!
Pretty sure I’ve repeatedly heard about the crawlers completely ignoring robots.txt, so does Cloudflare really do that much?
Like a lock on a door, it stops the vast majority but can’t do shit about the actual professional bad guys