- cross-posted to:
- hackernews
- cross-posted to:
- hackernews
Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.
The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.
The purpose is to reduce the flood to a manageable level, not to block every single scraper request.
The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what’s currently asked, but it’s a balancing act before it starts to really be an annoyance for the meat popsicle users.
And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.
I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦
Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?
Unless you have a dirty heatsink, no amount of hammering would make the server overheat
Are you explaining my own server to me? 🙄
What CPU do you have made after 2004 that doesn’t have automatic temperature control ?
I don’t think there is any, unless you somehow managed to disable it ?
Even a raspberry pi without a heatsink won’t overheat to shutdown
You are right, it is actually worse, it usually just overloads the CPU so badly that it starts to throttle and then I can’t even access the server via SSH anymore. But sometimes it also crashes the server so that it reboots, and yes that can happen on modern CPUs as well.
You need to set you http serving process to a priority below the administrative processes (in the place where you are starting it, so assuming linux server that would be your init script or systemd service unit).
Actual crash causing reboot ? Do you have faulty ram maybe ? That’s really not ever supposed to happen from anything happenning in userland. That’s not AI, your stuff might be straight up broken.
Only thing that isn’t broken that could reboot a server is a watchdog timer.
You server shouldn’t crash, reboot or become unreachable from the admin interface even at 100% load and it shouldn’t overheat either, temperatures should never exceed 80C no matter what you do, it’s supposed to be impossible with thermal management, which all processors have had for decades.
Yeah, I’m just wondering what’s going to follow. I just hope everything isn’t going to need to go behind an authwall.
The developer is working on upgrades and better tools. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/
I’ll say the developer is also very responsive. They’re (ambiguous ‘they’, not sure of pronouns) active in a libraries-fighting-bots slack channel I’m on. Libraries have been hit hard by the bots: we have hoards of tasty archives and we don’t have money to throw resources at the problem.
Cool, thanks for posting! Also the reasoning for the image is cool.