AI Dev 26 x SF | Diamond Bishop: The Next 100 Agents. Building the Agent Native Office

AI Dev 26 x SF | Diamond Bishop: The Next 100 Agents. Building the Agent Native Office

More

Descriptions:

Diamond Bishop, a 15-year AI/ML veteran at Datadog, takes the stage at AI Dev 26 x SF to share what Datadog has learned scaling from its first AI agents to what she calls the “agent-native office” — the organizational and technical challenge of building not one or two agents but hundreds. The talk is grounded in Datadog’s own production deployments: an automated SRE agent that debugs infrastructure problems, a Bits AI dev agent that writes and ships code fixes based on observed errors, and a security analyst agent that investigates suspicious signals in Datadog’s SIEM products.

A central focus is evaluation. Bishop identifies the absence of a rigorous eval framework early in Datadog’s agent development as their single biggest mistake — the team shipped an agent that “seemed” to work but had no way to measure whether iterative changes actually improved performance. She outlines a three-stage solution: offline eval on small but representative datasets, online monitoring of production behavior using observability signals (clicks, outcomes, traces), and a living feedback loop that continuously pulls real-world data back into the offline test suite.

Bishop also covers practical infrastructure choices for production agents: using Temporal for durable, fault-tolerant agent execution; treating chat as one modality among many rather than the default trigger; proper sandboxing to limit blast radius; and the recursive challenge of building agents to evaluate other agents. The talk is one of the more concrete, practitioner-focused sessions on enterprise-scale agent deployment available from this conference.


📺 Source: DeepLearningAI · Published May 22, 2026
🏷️ Format: Workflow Case Study

1 Item

Channels

1 Item

Companies