Building Generative Image & Video models at Scale – Sander Dieleman (Veo and Nano Banana)

Foundation Models3 weeks ago

Building Generative Image & Video models at Scale – Sander Dieleman (Veo and Nano Banana)

Descriptions:

Sander Dieleman, research scientist at Google DeepMind and a member of the generative media team behind Veo and Imagen (Nano Banana), presents a behind-the-scenes technical overview of what goes into training large-scale generative image and video models. Delivered at the AI Engineer conference, the talk spans eight structured sections and offers a rare insider perspective on production diffusion system design from one of the field’s leading labs.

The presentation moves through data curation — which Dieleman argues is critically underrated relative to model architecture work, and remains difficult to publish on given competitive sensitivity — followed by latent representations (explaining why pixel-space training gave way to compressed latent spaces as scale increased), the core mechanics of diffusion models grounded in Fourier frequency analysis, neural network architecture choices, and training-at-scale considerations. He then covers sampling strategies unique to diffusion models, distillation techniques for reducing inference steps without shrinking model size, and the control signals used to make models reliably follow user intent.

The frequency-domain explanation of why diffusion models work so well for visual data — connecting power-law image spectra, Gaussian noise characteristics, and the coarse-to-fine generation process — is a standout section rarely treated at this depth in public talks. For ML engineers, researchers, and technically minded practitioners building or studying generative media systems, this is high-signal reference material from someone actively training state-of-the-art models at Google DeepMind.

📺 Source: AI Engineer · Published April 21, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

DeepMind

Tags

DeepMind Nano Banana OpenAI TPU Veo

Prev

Aaron Levie: Everyone is Wrong; We’ll Have More Developers in 5 Years

Next

OpenAI’s new Image 2 model is just the beginning…

18 Related Posts

Related Posts

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

23 hours ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

23 hours ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

23 hours ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago