I Tested the First Diffusion Reasoning LLM… It’s Insanely Fast

Benchmarks3 months ago

I Tested the First Diffusion Reasoning LLM… It’s Insanely Fast

Descriptions:

Skill Leap AI takes an early hands-on look at Mercury 2, a reasoning large language model from Inception Labs that uses a diffusion-based architecture instead of the autoregressive token-by-token generation found in models like ChatGPT or Claude. The core claim — and the thing the video attempts to verify — is that Mercury 2 generates around 1,000 tokens per second, roughly five times faster than Claude Haiku 4.5, which is itself Anthropic’s speed-optimized model.

The video explains the architectural difference clearly: autoregressive models generate one token at a time like a typewriter, while diffusion models generate all tokens in parallel and refine them iteratively, more like an editor working on a draft. Mercury 2 is notable for being the first model to combine this diffusion speed with chain-of-thought reasoning, which previous diffusion LLMs lacked. The presenter runs a live side-by-side test — Mercury 2 on high reasoning effort versus Claude Haiku 4.5 with extended thinking — building a simple simulation in each, and the speed difference is visibly dramatic.

Pricing for Mercury 2 via API is $0.25 per million input tokens and $0.75 per million output tokens. The presenter highlights agent pipelines, voice applications, and customer service bots as ideal use cases where low latency and reasoning capability need to coexist — a combination that until Mercury 2 required trading one for the other.

📺 Source: Skill Leap AI · Published February 25, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

No Image Available

Skill Leap AI

Tags

Anthropic ChatGPT Claude Opus OpenAI

Prev

Gemini 3.1 Pro in Antigravity can do anything… just watch

Gemini 3.1 Pro in Antigravity can do anything… just watch

Next

This simple Claude Cowork system saves 5 hours a week

This simple Claude Cowork system saves 5 hours a week

18 Related Posts

Related Posts

11:12

Benchmarks

Qwen3.6 27B Gets 20% Faster with MTP and llama.cpp Locally

5 days ago

09:15

Benchmarks

ZAYA1-VL-8B: Efficient Open Visual Intelligence – Run Locally

6 days ago

04:40

Benchmarks

One API Key for Every AI Model (Pay With Crypto)

1 week ago

08:57

Benchmarks

Google Releases Gemma 4 MTP Drafters – Run Locally and DFlash Comparison

1 week ago

08:44

Benchmarks

Are AI Coding Skills Just Hype? I Tested Them

2 weeks ago

11:03

Benchmarks

I Didn’t Expect This: Opus 4.7 vs GPT 5.5

2 weeks ago