The AI Frontier: from Gemini 3 Deep Think distilling to Flash — Jeff Dean

Interviews3 months ago

The AI Frontier: from Gemini 3 Deep Think distilling to Flash — Jeff Dean

Descriptions:

Jeff Dean, Chief AI Scientist at Google DeepMind, joins Latent Space for an expansive technical conversation covering the Gemini model family’s design philosophy, the role of distillation in Google’s model strategy, and a remarkably detailed first-principles analysis of AI accelerator hardware economics — grounded in Dean’s decades of experience co-designing TPUs alongside the teams that train on them.

On the model side, Dean explains how distillation — a technique he co-developed in 2014 originally to compress image classification ensembles — has become central to delivering Gemini Flash and other efficient variants: you cannot build a highly capable small model without first having a frontier model to distill from, making the expensive frontier investment a prerequisite rather than an alternative to efficient deployment. He discusses how sparse architectures and mixture-of-experts approaches are being revisited as hardware has evolved, and how Google balances its obligation to billions of existing users against the need to push the capability frontier.

The hardware discussion is among the most technically rich available from any Google executive. Dean provides an energy-based explanation of why batching is economically necessary on TPUs: moving a model parameter from on-chip SRAM to a multiply unit costs roughly 1,000 pico-joules, while computing with it costs ~1 pico-joule — making batch-size-one inference deeply wasteful. He connects this to the economics of custom ASICs per model at billion-dollar training run scales, Google’s 3D mesh TPU topology, and the SRAM vs. HBM tradeoff for serving smaller models spread across many chips. Essential viewing for anyone working at the intersection of model development and infrastructure.

📺 Source: Latent Space · Published February 12, 2026
🏷️ Format: Interview

1 Item

Companies

No Image Available

Google

Tags

Gemini Gemini 3 Pro Gemini Flash Google TPU

Prev

The AI Wake-Up Call Everyone Needs Right Now!

The AI Wake-Up Call Everyone Needs Right Now!

Next

Quantum Computing and AI Boom: Inside the High-Stakes Tech Race | Bloomberg Tech: Europe 2/13/2026

Quantum Computing and AI Boom: Inside the High-Stakes Tech Race | Bloomberg Tech: Europe 2/13/2026

18 Related Posts

Related Posts

08:44

Interviews

AI Chipmaker Cerebras Raises $5.55 Billion in Year’s Biggest IPO

1 day ago

01:06:38

Interviews

Inside Abridge: The AI Listening to 100 Million Doctor Visits — Abridge’s Janie Lee & Chai Asawa

1 day ago

16:39

Interviews

How Emergent is making app building more accessible with Claude

2 days ago

01:16:02

Interviews

TypeScript, C# and Turbo Pascal with Anders Hejlsberg

2 days ago

23:34

Interviews

The Founders Who Left Tesla to Rebuild America | a16z

2 days ago

46:56

Interviews

“There Is No Task Agents Cannot Do” – Magnus Müller

2 days ago