The one-liner:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

This is brilliant.

  • 👍Maximum Derek👍@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    40
    ·
    15 hours ago

    Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      6
      ·
      14 hours ago

      That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?

      • 👍Maximum Derek👍@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        14
        ·
        14 hours ago

        My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          4
          ·
          14 hours ago

          Dang, I was hoping for a FOSS project that would do most of the heavy lifting for me. Maybe such a thing exists, idk, but it would be pretty cool to have a pluggable system that analyzes activity and tags connections w/ some kind of identifier so I could configure a web server to either send it nonsense (i.e. poison AI scrapers), zip bombs (i.e. bots that aren’t respectful of resources), or redirect to a honey pot (i.e. malicious actors).

          A quick search didn’t yield anything immediately, but I wasn’t that thorough. I’d be interested if anyone knows of such a project that’s pretty easy to play with.

          • A Basil Plant@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            edit-2
            13 hours ago

            Not exactly what you asked, but do you know about ufw-blocklist?

            I’ve been using this on my multiple VPSes for some time now and the number of fail2ban failed/banned has gone down like crazy. Previously, I had 20k failed attempts after a few months and 30-50 currently-banned IPs at all times; now it’s less than 1k failed after a year and maybe 3-ish banned at any time.

            There was also that paid service where users share their spammy IP address attempts with a centralized network, which does some dynamic intelligence monitoring. I forgot the name and search these days isn’t great. Something to do with “Sense”? It was paid, but well recommended as far as I remember.

            Edit: seems like the keyword is " threat intelligence platform"