• FauxLiving@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    3 hours ago

    LLM-driven web scraping is intense for some sites, so their bot detection software is tuned in a way that creates a lot of false positives.

    Obscuring your browser fingerprint, or blocking javascript, or using an unusual user-agent string can trigger a captcha challenge.

    If you’re not doing that and seeing a site suddenly start giving your captchas then they may be being DDoS’d by scrapers and are challenging all clients.

    A site that archives content is especially vulnerable because they have a lot of the data that is useful for AI training.

    It is incredibly annoying, but until we have a robust way of proving identity that can’t be gamed by bad actors we’re stuck with individual user challenges.