Text Diffusion — Brendon Dillon, Google DeepMind

Text Diffusion — Brendon Dillon, Google DeepMind

More

Descriptions:

Brendan Dillon, a research scientist at Google DeepMind, delivered a technically rigorous presentation at AI Engineer on text diffusion—an alternative text generation paradigm that differs fundamentally from the autoregressive token-by-token approach used by GPT, Gemini, and most other large language models. Instead of generating one token at a time with causal (past-only) attention, diffusion models initialize an entire output sequence as random noise and iteratively refine it over multiple forward passes, enabling bidirectional attention that lets the model see and correct future tokens during generation.

DeepMind’s Gemini Diffusion, released as a research preview to approximately 100,000 users roughly one year ago, achieved quality comparable to Gemini 2.0 Flash Lite at substantially better latency by exploiting full hardware parallelism across the output block rather than serial token generation. Dillon demonstrated one of the architecture’s most striking properties—self-correcting generation—with a concrete example: the model made an arithmetic error early in its output canvas, completed the full reasoning trace, recognized the mistake by attending to future tokens, and returned to fix it. GPT-4o and Gemini 2.5 Flash, both larger models, failed the same problem without correction.

Additional advantages covered include dynamic computation (more denoising steps produce monotonically higher quality across six internal coding benchmarks), elimination of certain reasoning artifacts intrinsic to causal attention, and the potential for the model to allocate more passes to harder segments of a response. Dillon closed by signaling that new developments from DeepMind in text diffusion are forthcoming.


📺 Source: AI Engineer · Published June 04, 2026
🏷️ Format: Deep Dive

1 Item

Channels

1 Item

Companies