Descriptions:
Sandipan Bhaumik, an engineer at Databricks with 18 years of experience in distributed systems — including time at AWS — opens this AI Engineer 2026 talk with a striking production war story: a five-agent credit decisioning system that produced incorrect risk ratings for 20% of decisions within three days of launch. The root cause was not a bad model or prompt, but a cache invalidation failure between a PostgreSQL database and a shared caching layer, causing the risk assessment agent to read a stale credit score 500 milliseconds after the correct value had been written.
From there, the talk builds a systematic framework for multi-agent coordination. Bhaumik explains why complexity doesn’t scale linearly — five agents create at least ten coordination relationships, each a potential race condition — and introduces a decision matrix for choosing between choreography (event-driven, decentralized) and orchestration (centralized control) patterns based on workflow complexity and required agent autonomy. A hybrid choreography-plus-saga pattern is recommended for workflows that need both.
The centerpiece of the session is a deep treatment of state management: why shared mutable state fails under concurrent agents even with modern databases, and how immutable state snapshots with append-only versioning eliminates entire classes of race conditions. Bhaumik shows how this pattern, combined with schema validation at each agent handoff, enables reliable rollback and full audit trails. The session closes with a walkthrough of a production architecture built on Databricks Agent Bricks.
📺 Source: AI Engineer · Published April 08, 2026
🏷️ Format: Deep Dive







