Descriptions:
DeepInfra, a purpose-built AI inference cloud, has raised $107 million in a funding round backed by NVIDIA, Samsung, and Super Micro. In this Bloomberg Technology interview, CEO Nicolas outlines the company’s strategy for scaling inference infrastructure and driving down cost per token across a growing open-source model ecosystem.
The company currently processes 5 trillion tokens per week across eight data centers, with plans to expand across the US and into Europe and Asia later in 2026. Nicolas explains how efficiency gains come from full-stack optimization — from data center selection and cluster architecture to software, with KV caching highlighted as especially critical as agentic workloads generate large volumes of repeated context-heavy requests.
The conversation also covers supply chain pressures that have intensified since early 2026, including shortages of GPUs, high-bandwidth memory, and storage — areas where strategic investors like Samsung and Super Micro provide supply access advantages. Nicolas also addresses the competitive landscape, noting that while Cerebras is pursuing an IPO and positioning itself as an NVIDIA alternative, DeepInfra is doubling down on NVIDIA hardware while focusing on inference efficiency. The interview offers a clear window into how purpose-built inference clouds are differentiating on infrastructure depth rather than model ownership.
📺 Source: Bloomberg Technology · Published May 04, 2026
🏷️ Format: Interview







