LLM-driven web scraping is intense for some sites, so their bot detection software is tuned in a way that creates a lot of false positives.
Obscuring your browser fingerprint, or blocking javascript, or using an unusual user-agent string can trigger a captcha challenge.
If you’re not doing that and seeing a site suddenly start giving your captchas then they may be being DDoS’d by scrapers and are challenging all clients.
A site that archives content is especially vulnerable because they have a lot of the data that is useful for AI training.
It is incredibly annoying, but until we have a robust way of proving identity that can’t be gamed by bad actors we’re stuck with individual user challenges.
Not every time, but far too often. They don’t seem to care that they’re discriminating against people with AV impairment, plus locking out some secure browsers.
Sometimes I’m able to get around it by tweaking some ublock permissions, but once I was surprised to discover that changing my user-agent with user-agent switcher seemed to do the trick. It’s really strange. Cloudflare’s captcha loops are inscrutable.
I haven’t faced a captcha but, it just took a solid 2 minutes to resolve and load the article for me. Maybe they have something else happening behind the scenes impacting performance so they are locking down certain routes?
I don’t have this problem;
You probably are using TOR or a VPN and it triggered the captcha, if it’s not then it’s def strange, never seen this happen to me
Anyone else facing captcha loops whenever they try to view an archive.is link? Haven’t been able to read subscriber only articles for months now
LLM-driven web scraping is intense for some sites, so their bot detection software is tuned in a way that creates a lot of false positives.
Obscuring your browser fingerprint, or blocking javascript, or using an unusual user-agent string can trigger a captcha challenge.
If you’re not doing that and seeing a site suddenly start giving your captchas then they may be being DDoS’d by scrapers and are challenging all clients.
A site that archives content is especially vulnerable because they have a lot of the data that is useful for AI training.
It is incredibly annoying, but until we have a robust way of proving identity that can’t be gamed by bad actors we’re stuck with individual user challenges.
Not every time, but far too often. They don’t seem to care that they’re discriminating against people with AV impairment, plus locking out some secure browsers.
Dang, yeah it’s probably my strict browser settings. Thanks for the confirmation of shared experience.
Sometimes I’m able to get around it by tweaking some ublock permissions, but once I was surprised to discover that changing my user-agent with user-agent switcher seemed to do the trick. It’s really strange. Cloudflare’s captcha loops are inscrutable.
I haven’t faced a captcha but, it just took a solid 2 minutes to resolve and load the article for me. Maybe they have something else happening behind the scenes impacting performance so they are locking down certain routes?
No but I do get about three or four challenges. I can paste the article for you if it helps?
I don’t have this problem; You probably are using TOR or a VPN and it triggered the captcha, if it’s not then it’s def strange, never seen this happen to me
Nope