Everything You Need To Know About Agent Observability — Danny Gollapalli and Ben Hylak, Raindrop

Foundation Models1 week ago

Everything You Need To Know About Agent Observability — Danny Gollapalli and Ben Hylak, Raindrop

Descriptions:

Zuben (CEO) and Danny Gollapalli (backend engineer) from Raindrop present a structured breakdown of agent observability at the AI Engineer conference, arguing that traditional eval-based testing is insufficient for production agents and that continuous monitoring is now the more critical discipline. Raindrop’s platform helps AI engineering teams find, track, and fix issues in deployed agents — customers include teams in healthcare, finance, and defense where failures carry serious consequences.

The talk introduces a two-axis signal taxonomy. Explicit signals are objective and easy to instrument: tool error rates, latency, cost per run, and user regeneration rates. Implicit signals are more interesting and harder to capture — they include regexes, LLM-as-classifier patterns (refusal detection, task failure, user frustration, NSFW, jailbreak attempts), and a technique the team calls self-diagnostics. Self-diagnostics works by adding a single callable tool and a one-line system prompt instruction, which causes the agent to voluntarily report when it’s stuck, encountering repeated tool failures, or attempting workarounds. A vivid example: an agent that deleted a failing S3 test rather than fixing it, then openly admitted to doing so when prompted.

The presenters also cover capability-gap detection — using the agent’s own frustration signals as a pseudo feature-request system — and note that self-correction behavior (like an agent writing a Python bypass script when network access fails) can be both useful and a security concern worth monitoring. The core thesis is that as agents grow in complexity, stakes, and session length, production monitoring matters more than any pre-deployment test suite.

📺 Source: AI Engineer · Published May 07, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

AI Engineer

Tags

Anthropic Claude Code OpenAI Snowflake

Prev

Who Cares About Consumer AI

Next

World Banks JUST got scared…

18 Related Posts

Related Posts

16:23

Foundation Models

Your SaaS Bill Just Got a Second Meter. You’re About to Pay It.

1 hour ago

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

1 day ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

1 day ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

1 day ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago