Shipping complex AI applications — Braintrust & Trainline

Shipping complex AI applications — Braintrust & Trainline

More

Descriptions:

Presented at AI Engineer Europe 2026 in London, this hands-on workshop from Braintrust and Trainline guides engineers through the complete lifecycle of shipping production-quality AI applications — from a bare single-LLM call to a fully monitored, evaluable, and iteratively improving agentic system. The session is led by Jirean from Braintrust alongside Usama and Mayan, senior AI/ML engineers at Trainline, the UK’s leading train ticketing platform, who share lessons from real enterprise deployments.

Using a customer support ticket classification agent as the working example, the workshop follows a structured four-stage progression: scaffolding a basic agent with a one-shot prompt, adding distributed tracing to capture production behavior, assembling a “golden set” of labeled examples for systematic evaluation, and closing the improvement loop using Braintrust’s managed evaluation infrastructure. Each stage maps to a tagged Git checkpoint, making every step independently reproducible.

The core thesis is that shipping AI in production is fundamentally an operationalization problem, not a modeling one. A working demo proves little about production reliability; tracing and evaluation are prerequisites for building the flywheel that converts production failures into labeled data and labeled data into model improvements. Trainline’s experience illustrates how enterprise teams can structure cross-functional collaboration — between AI engineers, product teams, and domain experts — around shared evaluation artifacts. Engineers moving from proof-of-concept to production-grade AI applications will find this one of the most practically structured treatments of the topic available.


📺 Source: AI Engineer · Published May 01, 2026
🏷️ Format: Hands On Build

1 Item

Channels