Too lazy to create the meme. Insert the two astronauts looking at earth meme
Wait, there is no decentralized internet?
Always has been.
Apparently there is a decentralized internet out there. Just we are not experiencing it right now. Skill issue, huh?
insert cursed wojak reaction
cursed wojak


Always have an extra alt
Just only one extra alt, I swear…
Running an instance without cloudflare in front is hard work, because AI scrapers bring it to it’s knees. It’s a never ending battle to block them even with Cloudflare, at least Cloudflare can help reduce the load, and even the free version comes with many tools to identify and block problematic bots.
Though if you turn on bot blocking you break federation, so you have to be a lot more refined in your security rules.
because AI scrapers bring it to it’s knees
There are three (at least) piece of web software to protect from AI Scrapers currently, it should be more than possible without Cloudflare.
It’s not even possible to do a good job of it with Cloudflare. What are the three you are referring to? The most commonly known one is Anubis, which Codeberg found AI bots had learnt to solve them.
Yeah so anubis is the bot blocking one, already breached by bots.
Iocaine is an LLM maze and poisoner, intended to trap a bot but your site still needs the resources to serve all the requests, and it’s not clear what happens when a user is accidentally identified as a bot.
Why does turning off bots turn off federation?
Cloudflare’s bot detection triggers the blocking because federation looks a lot like a bot (well, it is a bot).
For example, Lemmy.world will send my instance hundreds of thousands if not millions of requests a day, in a near steady stream. It’s telling my instance about every post, comment, or vote. AI scrapers send hundreds of thousands of requests or millions in a near steady stream each day.
For all intents and purposes, federation is bot traffic and looks just like it. Typically I block by identifying high traffic ASNs (a group of IPs run by the same entity, because blackhat AI scrapers use many IPs) and showing a cloudflare challenge (which will typically have a 0% pass rate). If it’s from 1IP then it’s probably a federated instance, but I typically see many IPs from the same area spread with an even spread of requests.
I also try to exclude federation/API endpoints, which can help stop false positives as scrapers are generally loading the web page.
This is something Lemmy (and PieFed, Mbin) admins try to help each other with strategies for because one day a bot will find you and suddenly your instance is down because they are hammering you too hard.
I bet if you are in China, Brazil, Singapore, Argentina, etc then you will see a lot of blocked content on Lemmy, as this is often where the bot traffic comes from (Google, Facebook, OpenAI, Amazon, etc will typically respect the robots.txt so US traffic is less of an issue).






