Descriptions:
Benjamin Cowen, a forward-deployed machine learning engineer at Modal, delivers a conference talk examining one of the most consequential decisions AI product teams face: when to move from frontier APIs like OpenAI or Anthropic to a custom fine-tuned model. Drawing on his experience across a wide range of Modal customers — from quantum chemistry simulations to LLM-powered agents — Cowen maps out a spectrum from zero-customization frontier APIs to fully self-managed training clusters, and argues that a practical middle ground is now accessible to most product teams.
Cowen shares concrete signals that indicate a company is approaching the fine-tuning threshold: API costs exceeding customer revenue, plateauing evaluation scores, and enterprise contracts with strict latency or throughput requirements that off-the-shelf models can’t meet. He cites Intercom as achieving comparable performance to frontier models at one-fifth the cost, and quotes customer Decagon’s insight that frontier labs optimize for general capability while product companies need to win specifically at their own business logic.
The talk emphasizes that the infrastructure barrier has dropped dramatically. Modern open-source training libraries now provide algorithm-level control without requiring a dedicated ML infrastructure team or a dedicated GPU cluster. Cowen’s core message: if you’ve built an agent harness and collected evaluation data, you likely already have everything needed to begin fine-tuning — and Modal’s serverless compute platform is designed to make that iteration cycle as fast as working with a frontier API.
📺 Source: AI Engineer · Published June 02, 2026
🏷️ Format: Deep Dive







