Descriptions:
In this AI Engineer conference session, Rafael Levi from Bright Data demonstrates how to build self-maintaining web scraping pipelines using the Bright Data MCP server paired with Claude Code. The core argument: instead of asking an LLM to parse every HTML page directly — which burns enormous token budgets — you instruct the agent to write a reusable scraper once, then run that scraper against all subsequent pages.
The Bright Data MCP exposes 66 tools, including direct curl-to-any-URL with automatic CAPTCHA solving, a markdown-only fetch mode to strip HTML tags and cut token consumption, and around 500 pre-built structured data APIs for domains like Amazon. Levi demos the system building a working Walmart product search scraper from a single natural-language prompt in roughly three minutes — a site with aggressive bot detection that blocks unauthenticated fetch calls entirely. He quantifies the savings at roughly one million tokens per three pages compared to feeding raw HTML to an LLM.
The talk also covers autonomous pipeline maintenance: a cron-style agent polls collected data every 30 minutes, validates completeness, and self-corrects when fields are missing — eliminating the on-call burden that traditionally comes with production scrapers. The MCP tier offers 5,000 free requests for new accounts.
📺 Source: AI Engineer · Published June 07, 2026
🏷️ Format: Hands On Build







