I Tested the First Diffusion Reasoning LLM… It’s Insanely Fast

I Tested the First Diffusion Reasoning LLM… It’s Insanely Fast

More

Descriptions:

Skill Leap AI takes an early hands-on look at Mercury 2, a reasoning large language model from Inception Labs that uses a diffusion-based architecture instead of the autoregressive token-by-token generation found in models like ChatGPT or Claude. The core claim — and the thing the video attempts to verify — is that Mercury 2 generates around 1,000 tokens per second, roughly five times faster than Claude Haiku 4.5, which is itself Anthropic’s speed-optimized model.

The video explains the architectural difference clearly: autoregressive models generate one token at a time like a typewriter, while diffusion models generate all tokens in parallel and refine them iteratively, more like an editor working on a draft. Mercury 2 is notable for being the first model to combine this diffusion speed with chain-of-thought reasoning, which previous diffusion LLMs lacked. The presenter runs a live side-by-side test — Mercury 2 on high reasoning effort versus Claude Haiku 4.5 with extended thinking — building a simple simulation in each, and the speed difference is visibly dramatic.

Pricing for Mercury 2 via API is $0.25 per million input tokens and $0.75 per million output tokens. The presenter highlights agent pipelines, voice applications, and customer service bots as ideal use cases where low latency and reasoning capability need to coexist — a combination that until Mercury 2 required trading one for the other.


📺 Source: Skill Leap AI · Published February 25, 2026
🏷️ Format: Benchmark Test

1 Item

Channels