I spent a year on Linux and forgot to miss Windows

Sahwa@reddthat.com · 21 days ago

I spent a year on Linux and forgot to miss Windows

FauxLiving@lemmy.world · 21 days ago

LLM-driven web scraping is intense for some sites, so their bot detection software is tuned in a way that creates a lot of false positives.

Obscuring your browser fingerprint, or blocking javascript, or using an unusual user-agent string can trigger a captcha challenge.

If you’re not doing that and seeing a site suddenly start giving your captchas then they may be being DDoS’d by scrapers and are challenging all clients.

A site that archives content is especially vulnerable because they have a lot of the data that is useful for AI training.

It is incredibly annoying, but until we have a robust way of proving identity that can’t be gamed by bad actors we’re stuck with individual user challenges.