Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

Foundation Models23 hours ago

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

Descriptions:

Amy Boyd and Nitya Narasimhan from Microsoft’s Azure AI Foundry developer relations team deliver a hands-on workshop at AI Engineer London titled “Mind the Gap,” using the London Underground safety announcement as a sustained analogy for the distance between what an agent is designed to do and what it actually does in production. Their central argument: observability must be built in from day one, not retrofitted, and trace-linked evaluations are the key mechanism for shortening the gap between detecting a problem and diagnosing its root cause.

Boyd opens by demonstrating Microsoft Azure AI Foundry’s no-code evaluation tooling — creating a project, attaching a model with web search, running the agent, selecting evaluation metrics including task adherence and safety, and reviewing trace-linked results in the Foundry UI — all without writing code. The session reveals a concrete example where task adherence scored unexpectedly low, illustrating how early evals surface quality issues before an agent reaches production.

Narasimhan then shifts to the SDK layer, covering how to implement tracing programmatically, write custom prompt-based and code-based evaluators, and interpret evaluation results tied directly to specific trace steps. The presenters explain why this matters when, for example, a model swap causes tool call efficiency to drop — evals flag the regression and the trace shows exactly where in the execution the behavior changed. All workshop assets are available in a maintained GitHub repository, with ongoing support through a Microsoft Foundry Discord channel.

📺 Source: AI Engineer · Published May 14, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

Microsoft

Tags

GPT-4.1 Microsoft

Prev

DramaBox – Run Most Expressive TTS with Voice Cloning Locally

Next

800+ hours of Learning Claude Code in 8 minutes (2026 tutorial / unknown tricks / newest model)

800+ hours of Learning Claude Code in 8 minutes (2026 tutorial / unknown tricks / newest model)

18 Related Posts

Related Posts

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

23 hours ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

23 hours ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago

21:43

Foundation Models

Every Level of Claude Explained in 21 Minutes

3 days ago