AI Dev 26 x SF | Or Dagan: Optimizing Accuracy, Cost, and Latency in Real-World Agents

Foundation Models2 months ago

AI Dev 26 x SF | Or Dagan: Optimizing Accuracy, Cost, and Latency in Real-World Agents

Descriptions:

Or Dagan from AI21 Labs walks through the practical challenge every production agent team faces: accuracy, cost, and latency cannot all be maximized simultaneously, and optimizing one typically degrades another. His talk at AI Dev SF 2026 presents a structured, data-driven methodology for navigating this tradeoff space, using BrowseComp+—a deep research benchmark—as the running example, where AI21 achieved a new state-of-the-art result.

The framework divides optimization into two buckets. Configuration-level work covers model selection, prompt tuning (including automated methods like DSPy), and tool composition, and quickly surfaces a Pareto frontier of best-performing configurations plotted against cost or latency. Scaling techniques go further: vertical scaling via longer reasoning chains or critique-repair loops, and horizontal scaling through best-of-N sampling and multi-model ensembles. The ensemble approach is particularly striking—using a diverse set of models of different sizes reduces cost by more than half and latency by 20% compared to running the best single model multiple times, because different models solve different subsets of tasks.

Dagan closes by acknowledging the combinatorial explosion that results from mixing models, tools, prompts, ensemble sizes, and execution strategies. He frames this as an automated optimization problem rather than a manual search, positioning AI21’s tooling as a solution to what currently costs teams months and thousands of dollars in manual experimentation. The session is dense with specific numbers and is one of the more technically rigorous agent optimization talks at the conference.

📺 Source: DeepLearningAI · Published May 22, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

DeepLearningAI

Tags

Andrej Karpathy GPT-5 MiniMax

Prev

This is absolutely CRAZY

Next

printf is Actually a Secret Virtual Machine – And a Giant Security Hole!

18 Related Posts

Related Posts

21:09

Foundation Models

Persona Engineering: A Field Guide to AI Synthetic Personas — Ishan Anand, InsightSciences.ai

1 day ago

21:39

Foundation Models

Serving 2 Million Models Without Melting: Scaling the Hugging Face Hub — Arek Borucki, Hugging Face

2 days ago

06:40

Foundation Models

AMD Releases First Ever AI model: Instella-MoE-16B-A3B-Think

2 days ago

24:01

Foundation Models

US AI Dominance Is Over: Here’s Why

3 days ago

17:31

Foundation Models

The Messy Reality of Scale: Synthetic Data and Pre-Training — Marah Abdin & Robert McHardy, poolside

4 days ago

20:24

Foundation Models

From Agent Traces to Agent Simulations — Rustem Feyzkhanov, Snorkel AI

5 days ago