Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.

  • rtxn@lemmy.world
    link
    fedilink
    English
    arrow-up
    102
    ·
    edit-2
    7 hours ago

    The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.

    The purpose is to reduce the flood to a manageable level, not to block every single scraper request.

    • AnUnusualRelic@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 hour ago

      The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what’s currently asked, but it’s a balancing act before it starts to really be an annoyance for the meat popsicle users.

    • poVoq@slrpnk.net
      link
      fedilink
      English
      arrow-up
      52
      arrow-down
      1
      ·
      edit-2
      7 hours ago

      And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.

      I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦

      • mobotsar@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        17 minutes ago

        Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?

          • interdimensionalmeme@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            3
            ·
            1 hour ago

            What CPU do you have made after 2004 that doesn’t have automatic temperature control ?
            I don’t think there is any, unless you somehow managed to disable it ?
            Even a raspberry pi without a heatsink won’t overheat to shutdown

            • poVoq@slrpnk.net
              link
              fedilink
              English
              arrow-up
              3
              ·
              1 hour ago

              You are right, it is actually worse, it usually just overloads the CPU so badly that it starts to throttle and then I can’t even access the server via SSH anymore. But sometimes it also crashes the server so that it reboots, and yes that can happen on modern CPUs as well.

              • interdimensionalmeme@lemmy.ml
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                2
                ·
                44 minutes ago

                You need to set you http serving process to a priority below the administrative processes (in the place where you are starting it, so assuming linux server that would be your init script or systemd service unit).

                Actual crash causing reboot ? Do you have faulty ram maybe ? That’s really not ever supposed to happen from anything happenning in userland. That’s not AI, your stuff might be straight up broken.

                Only thing that isn’t broken that could reboot a server is a watchdog timer.

                You server shouldn’t crash, reboot or become unreachable from the admin interface even at 100% load and it shouldn’t overheat either, temperatures should never exceed 80C no matter what you do, it’s supposed to be impossible with thermal management, which all processors have had for decades.

      • tofu@lemmy.nocturnal.gardenOP
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        1
        ·
        edit-2
        6 hours ago

        Yeah, I’m just wondering what’s going to follow. I just hope everything isn’t going to need to go behind an authwall.