The math behind how LLMs are trained and served – Reiner Pope

Foundation Models2 weeks ago

The math behind how LLMs are trained and served – Reiner Pope

Descriptions:

Dwarkesh Patel interviews Reiner Pope — CEO of chip startup MatX and former Google TPU architect — in a blackboard lecture format designed to make the economics and engineering of large language models genuinely comprehensible to a technical audience. The session opens with a motivating question: why does paying 6x more for Claude Code’s Fast Mode yield only 2.5x faster token streaming, and could you go further in either direction? Pope answers by introducing roofline analysis on an NVIDIA Blackwell NVL72 cluster (72 GPUs), modeling inference time as the maximum of memory fetch time and compute time, and showing how batch size drives cost efficiency by up to 1000x.

The lecture moves through the math of serving MoE models like DeepSeek V3 (37 billion active parameters, 700 billion total), covering expert parallelism, tensor parallelism, and pipeline parallelism — including why pipeline parallelism saves memory capacity rather than runtime, and why the best partitioning strategies tend to mirror the model’s own layer and expert structure. Pope draws on his experience designing TPU systems at Google to explain why architectural decisions in modern LLMs are often downstream of hardware constraints rather than pure algorithmic preference.

This is one of the most rigorous publicly available explanations of how LLM infrastructure shapes API pricing, latency tiers, and model design choices. Engineers, researchers, and technically minded investors looking to build genuine intuition for why AI systems are built and priced the way they are will find this lecture exceptionally valuable.

📺 Source: Dwarkesh Patel · Published April 29, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

Dwarkesh Patel

Tags

DeepSeek Gemini Google GPT-4 Nvidia Rubin

Prev

How Deepseek v4 Connects to the US Grid

Next

Claude Design Masterclass: Websites, Videos & More (2 Hours)

18 Related Posts

Related Posts

16:23

Foundation Models

Your SaaS Bill Just Got a Second Meter. You’re About to Pay It.

2 hours ago

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

1 day ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

1 day ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

1 day ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago