How Google DeepMind Runs Agents at Scale — KP Sawhney & Ian Ballantyne, Google DeepMind

How Google DeepMind Runs Agents at Scale — KP Sawhney & Ian Ballantyne, Google DeepMind

More

Descriptions:

Google DeepMind software engineer KP Sawhney and developer relations engineer Ian Ballantyne take the stage at the AI Engineer conference to walk through how DeepMind designs and scales agentic systems in production. The talk centers on Antigravity, DeepMind’s internal Visual Studio-style IDE that bundles a full agent manager framework, allowing developers to spawn and coordinate multiple agents across projects with built-in planning, browser control, DOM inspection, screenshot capture, and human-in-the-loop feedback at each step.

Sawhney, who previously built DeepMind’s Deep Research agent (now available via the Interactions API), details the platform team’s current engineering focus: scaling agentic workflows across DeepMind’s large monorepo and generalizing the Antigravity harness to broader use cases. He covers multi-model routing strategies — using lightweight, quota-free models like Gemma 4 for cost-sensitive subtasks while reserving more capable models for critical reasoning steps — as well as evaluation design for complex agentic pipelines, including mock-TPU environments that let teams test harness logic without burning real compute hours.

The conversation rounds out with the hard operational problem of resource fairness: how to prevent power users from starving shared infrastructure by spinning up large fleets of parallel agents. Sawhney acknowledges the current approach is essentially brute-force quota enforcement, and frames this as a bellwether for broader open questions about how token-hungry agentic systems will ultimately be priced — pointing to Anthropic’s recent moves around subscription limits as an early indicator of where the industry is heading.


📺 Source: AI Engineer · Published May 24, 2026
🏷️ Format: Deep Dive

1 Item

Channels

1 Item

Companies