Descriptions:
Lawrence Jones, founding engineer at incident.io, explains how his team uses AI tooling to manage the growing complexity of their AI-powered incident response platform — a system used by companies like Netflix, Etsy, and Skyscanner. Delivered at the AI Engineer conference, the talk focuses on what happens when your AI product grows beyond a handful of prompts into a web of dozens of agents, hundreds of tools, and thousands of daily LLM calls that no single human can tractably debug.
Jones covers three practical strategies his team developed in production. First, they built a small CLI to let coding agents like Claude Code and Codex directly read, edit, and add eval test cases — enabling a fully agent-driven prompt improvement loop where a coding agent identifies a failure, writes an eval to prove it, fixes the prompt, and verifies no regressions were introduced. Second — which he calls the team’s biggest unlock — they converted their internal debugging UIs into downloadable file systems, giving Claude Code and Codex direct access to system state for analysis. Third, they built repeatable analysis pipelines using AI agents to systematically assess performance across all customer accounts.
The talk is grounded in real incident.io infrastructure: their AI SRE product runs investigations that cross-reference hundreds of telemetry queries against logs, metrics, traces, and historical incident data. Jones’s approach to using AI to understand AI is a practical, production-hardened guide for any engineering team building multi-agent systems that have outgrown human-only observability.
📺 Source: AI Engineer · Published May 17, 2026
🏷️ Format: Deep Dive







