Lemmy: Bestiverse
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS BotMB to Hacker NewsEnglish · 9 months ago

Foundry (YC F24) Hiring Founding Engineer to Build an Internet-Scale Web Crawler

www.ycombinator.com

external-link
message-square
0
fedilink
0
external-link

Foundry (YC F24) Hiring Founding Engineer to Build an Internet-Scale Web Crawler

www.ycombinator.com

RSS BotMB to Hacker NewsEnglish · 9 months ago
message-square
0
fedilink
Founding Engineer: Large-Scale Web Scraping & Crawling at Foundry | Y Combinator
www.ycombinator.com
external-link
About Us We’re building the first end-to-end testing platform for web agents, including a Browser Gym for RL-driven optimization. Our platform helps teams evaluate, benchmark, and improve web agents before they go live, ensuring they can handle real-world, dynamic environments. With synthetic user simulations, automated evaluations, and large-scale benchmarking, we’re setting a new standard for web agent testing. We’re a YC-backed team, and this is a founding engineering role—you’ll be one of the first hires defining how we crawl, structure, and analyze the open web at scale. The Role We need a Founding Web Scraping Engineer to build internet-scale web crawling infrastructure—not just scraping a single site, but handling millions of domains and evolving anti-bot defenses. You’ll be responsible for designing robust, distributed crawling systems that adapt dynamically to web changes, optimize for efficiency, and ensure reliable data extraction. What You’ll Do Build large-scale, distributed crawlers that intelligently prioritize, schedule, and optimize requests across millions of domains. Develop adaptive web scraping systems that handle DOM changes, WebSockets, AJAX-heavy sites, and dynamically loaded content. Optimize scraping performance and resilience, ensuring high-throughput data extraction with proxy/network optimizations and behavior-driven stealth tactics. Solve captchas at scale, integrating third-party solvers, heuristic-based workarounds, and behavior-driven bypass techniques. Manage proxy and identity rotation, implementing session-aware scraping, JA3/TLS fingerprint spoofing, and request signature control. Structure and clean extracted data for downstream analytics, AI training, and benchmarking applications. What We’re Looking For Expert-level experience in large-scale web scraping & crawling (Selenium, Puppeteer, Playwright, Scrapy, undetected-chromedriver). Deep knowledge of anti-bot detection strategies (TLS fingerprinting, JA3 signatures, request header anomalies, and bot behavior tracking). Hands-on expertise with captcha-solving strategies, including leveraging APIs, OCR-based approaches, and behavior-driven evasion. Proven experience building efficient proxy management systems, including rotating IPs across residential, datacenter, and mobile networks. Proficiency in Python, Go, or JavaScript, with experience in high-performance, parallelized scraping frameworks. Understanding of HTTP/2, HTTP/3, WebSockets, GraphQL, and browser-based fingerprinting. Experience designing scalable, fault-tolerant scraping infrastructure that adapts to changes in real time. Bonus Points Experience with search engine-scale crawling. Background in LLM-driven web extraction or RL-enhanced adaptive crawling. Contributions to open-source scraping tools or web automation projects. Why Join? Founding role—you’ll define and own our web crawling infrastructure from day one. Work at internet scale—building a system that dynamically adapts and scales across millions of domains. YC-backed—we’re building something that doesn’t exist yet, and you’ll be part of the core team making it happen.

Comments

alert-triangle
You must log in or register to comment.

Hacker News

hackernews

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 423 users / day
  • 1.64K users / week
  • 3.71K users / month
  • 9.52K users / 6 months
  • 2 local subscribers
  • 2.98K subscribers
  • 35.9K Posts
  • 16K Comments
  • Modlog
  • mods:
  • patrick
  • RSS Bot
  • BE: 0.19.5
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org