You Might Not Need 50 Diffusion Steps — Ziv Ilan, Nvidia

Foundation Models3 days ago

You Might Not Need 50 Diffusion Steps — Ziv Ilan, Nvidia

Descriptions:

At the AI Engineer conference, Nvidia’s Ziv Ilan — a researcher in Nvidia’s AI labs team based in Paris — presents a practical framework for closing the latency gap in diffusion-based image and video generation models. While models like Flux 2, LTX 2.1, and Google’s latest video generation systems have reached high quality, the 20–50 denoising steps they require make them too slow for real-time developer and enterprise use cases. Ilan covers three optimization layers in order of implementation complexity: quantization, caching, and step distillation.

On quantization, Ilan describes work with Black Forest Labs on Flux 2, using dynamic post-training quantization on Nvidia’s Blackwell architecture. Pre-quantized checkpoints are available on Hugging Face, and detailed examples are published in Nvidia’s open-source TRT-LLM visual generation repository. For caching, he explains T-Cache and more advanced chunk-based caching, where unchanged regions of an image or video frame between denoising steps are identified and skipped — offering meaningful speedups if threshold tuning is handled carefully to avoid quality degradation. The technique is already integrated into vLLM, OmniGen, and other serving libraries.

The most impactful technique is step distillation: training a student diffusion model to match teacher model output quality in as few as 1 to 8 steps instead of 50, enabling potential 10x to 200x throughput improvements and making real-time generation achievable. Ilan draws an analogy to DeepSeek’s model distillation work — but notes that for diffusion models, the goal is step reduction rather than parameter reduction. A live demo from Nvidia’s recent GTC conference in San Jose illustrates the practical results at 1080p quality.

📺 Source: AI Engineer · Published June 16, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

Nvidia

Tags

DeepSeek LTX2 Nvidia VLLM

Prev

This MCP makes Hermes Agent 10x more powerful

Next

New #1 open-source AI model is here!

18 Related Posts

Related Posts

33:39

Foundation Models

9 AI Agent Trends That Will Put You Ahead of 99% of People

20 hours ago

30:10

Foundation Models

Google’s SHOCKING “POST AGI” paper…

20 hours ago

10:12

Foundation Models

I read every major CS paper of the last 100 years…

2 days ago

06:57

Foundation Models

They Looked Inside Claude’s AI’s Mind. It Got Weird

3 days ago

20:11

Foundation Models

Why MCP and ChatGPT Apps Use Double Iframes — Frédéric Barthelet, Alpic

4 days ago

19:37

Foundation Models

Only 1 in 1,600 People Use Codex. Here’s How to Catch Up.

7 days ago