Claude, Gemini, and ChatGPT Now Click Buttons For You

Claude, Gemini, and ChatGPT Now Click Buttons For You

More

Descriptions:

Dylan Davis covers the current state of AI browser agents across four major platforms: OpenAI’s Atlas browser, Perplexity’s Comet, Google’s Gemini Auto Browse (recently launched), and Anthropic’s Claude browser extension. These tools can navigate websites, click through menus, and interact with interfaces autonomously — but all are in early-stage releases with significant reliability constraints.

The technical mechanism involves a continuous loop: the agent takes a screenshot, analyzes the page layout, takes an action, then repeats. This loop fills the model’s context window quickly, degrading performance at higher step counts. Davis identifies a practical sweet spot of 10–12 steps based on firsthand testing — at 15 steps reliability begins to slip, and at 25 steps agents typically go off the rails or halt mid-task.

Three use-case categories where browser agents reliably deliver value today: admin and data retrieval (pulling financial reports from platforms like Stripe or Mercury using plain-language descriptions the AI translates into exact navigation paths), technical setup tasks (configuring Google Cloud OAuth credentials and connecting them to tools like Supabase without any prior platform knowledge), and repetitive data entry (filling insurance forms, vendor applications, or patient intake forms from a source-of-truth document). Davis also demonstrates building plain-language “cheat sheets” that persist across sessions, letting the AI skip exploratory navigation on repeat tasks.


📺 Source: Dylan Davis · Published February 07, 2026
🏷️ Format: Tutorial Demo