LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Foundation Models2 months ago

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Descriptions:

Dat Ngo, AI architect at Arize AI, presents a structured framework for making LLM systems observable, evaluable, and experimentally improvable — drawing on experience with large enterprise deployments that collectively process between 100 billion and 1 trillion tokens annually. The session is organized around three layers: observability, signal derivation, and experimentation.

On observability, Ngo explains how Arize AX is built on OpenTelemetry as its foundational telemetry standard, using auto-instrumenters that add one line of code to emit traces and spans from any supported framework or SDK. He distinguishes between trace-level visibility (individual tool calls and agent steps), session-level visibility (multi-turn conversation state), and run-level visibility (batch pipeline outcomes) — referencing the Anthropic managed agents paper released two days prior as relevant context.

For evaluation, Ngo outlines five signal types: LLM-as-judge, human annotation, golden datasets, deterministic logic checks (e.g., schema validation), and business metrics. A key practical insight is that fixing one failure in a non-deterministic system frequently introduces two or three regressions elsewhere, making regression suites and continuous eval harnesses essential. The talk also addresses organizational dynamics: how to divide prompt engineering and eval definition between AI engineers and non-technical domain experts within enterprise teams.

📺 Source: AI Engineer · Published June 07, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

Arize AI

Tags

Alex Anthropic Arize AI Claude Code Reddit Uber

Prev

Anthropic Files $965B IPO, Trump Signs AI Executive Order, and ChatGPT Crosses 1B Users | EP #262

Next

Master Ideogram 4 Layouts: Pro Poster Design with Visual Prompt Builder

18 Related Posts

Related Posts

21:09

Foundation Models

Persona Engineering: A Field Guide to AI Synthetic Personas — Ishan Anand, InsightSciences.ai

23 hours ago

21:39

Foundation Models

Serving 2 Million Models Without Melting: Scaling the Hugging Face Hub — Arek Borucki, Hugging Face

2 days ago

06:40

Foundation Models

AMD Releases First Ever AI model: Instella-MoE-16B-A3B-Think

2 days ago

24:01

Foundation Models

US AI Dominance Is Over: Here’s Why

3 days ago

17:31

Foundation Models

The Messy Reality of Scale: Synthetic Data and Pre-Training — Marah Abdin & Robert McHardy, poolside

4 days ago

20:24

Foundation Models

From Agent Traces to Agent Simulations — Rustem Feyzkhanov, Snorkel AI

5 days ago