AI Dev 26 x SF | Diamond Bishop: The Next 100 Agents. Building the Agent Native Office

Agents & Automation2 months ago

AI Dev 26 x SF | Diamond Bishop: The Next 100 Agents. Building the Agent Native Office

Descriptions:

Diamond Bishop, a 15-year AI/ML veteran at Datadog, takes the stage at AI Dev 26 x SF to share what Datadog has learned scaling from its first AI agents to what she calls the “agent-native office” — the organizational and technical challenge of building not one or two agents but hundreds. The talk is grounded in Datadog’s own production deployments: an automated SRE agent that debugs infrastructure problems, a Bits AI dev agent that writes and ships code fixes based on observed errors, and a security analyst agent that investigates suspicious signals in Datadog’s SIEM products.

A central focus is evaluation. Bishop identifies the absence of a rigorous eval framework early in Datadog’s agent development as their single biggest mistake — the team shipped an agent that “seemed” to work but had no way to measure whether iterative changes actually improved performance. She outlines a three-stage solution: offline eval on small but representative datasets, online monitoring of production behavior using observability signals (clicks, outcomes, traces), and a living feedback loop that continuously pulls real-world data back into the offline test suite.

Bishop also covers practical infrastructure choices for production agents: using Temporal for durable, fault-tolerant agent execution; treating chat as one modality among many rather than the default trigger; proper sandboxing to limit blast radius; and the recursive challenge of building agents to evaluate other agents. The talk is one of the more concrete, practitioner-focused sessions on enterprise-scale agent deployment available from this conference.

📺 Source: DeepLearningAI · Published May 22, 2026
🏷️ Format: Workflow Case Study

1 Item

Channels

No Image Available

DeepLearningAI

1 Item

Companies

No Image Available

DataDog

Tags

A2A Anthropic DataDog MCP OpenAI Pydantic AI Temporal Thinking Machines Labs

Prev

This is absolutely CRAZY

Next

printf is Actually a Secret Virtual Machine – And a Giant Security Hole!

18 Related Posts

Related Posts

09:21

Agents & Automation

Loop engineer practice #1: Reddit loop grew 0 to 95 Karma in 7 days

2 hours ago

22:29

Agents & Automation

My $100K ARR App Costs $4,198/Month to Run (Full Breakdown)

1 day ago

33:39

Agents & Automation

AI Agents for Performance: Ship Faster, Pay Less — Rajat Shah, Netflix

2 days ago

21:16

Agents & Automation

You Can Hand One AI Agent Your Worst Recurring Task. It Cleared 60% Of Mine.

4 days ago

21:39

Agents & Automation

Building Closed-Loop Evals for a Multimodal Agent at Scale — Soumya Gupta & Jai Chopra, Uber

6 days ago

19:38

Agents & Automation

Learned Execution Graphs for Anomaly Detection & Drift in APIs — Ritvik Pandya, JP Morgan Chase

1 week ago