Descriptions:
Matthew Berman breaks down the strategic significance of OpenAI’s reported partnership with Cerebras Systems—a three-year deal reportedly worth over $10 billion covering 750 megawatts of compute—and traces how it connects to a broader industry shift toward specialized inference chips.
The video reconstructs the chain of events: Google’s Gemini 3, trained and served on TPUs rather than Nvidia GPUs, demonstrated that non-Nvidia infrastructure could produce frontier models—prompting Nvidia to acquire Groq’s chip technology in a $20 billion licensing deal. With Groq now effectively under Nvidia’s umbrella, OpenAI chose Cerebras as a partner to avoid concentrating its inference dependency on a single vendor. Cerebras chips achieve over 3,000 tokens per second (compared to Groq’s ~465), and because they bake memory directly onto the wafer, they are unaffected by the DRAM shortages currently driving GPU prices toward $5,000 for Nvidia’s RTX 5090. The video includes a clip of Cerebras CEO Andrew Feldman explaining the memory architecture advantage.
Berman argues that inference—not training—is where AI lab revenue compounds, making chip speed a central strategic variable. He contends that specialized inference silicon is now a permanent fixture of the AI stack, and that the OpenAI-Cerebras partnership could deliver ChatGPT response speeds dramatically faster than what users experience today, with compounding benefits for latency-sensitive use cases like coding agents.
📺 Source: Matthew Berman · Published January 15, 2026
🏷️ Format: News Analysis







