Kimi K2.5 just dropped… (WOAH)

Research & Benchmarks4 months ago

Kimi K2.5 just dropped… (WOAH)

Descriptions:

Matthew Berman covers the release of Kimi K2.5, a natively multimodal open-weights model from Chinese AI lab Moonshot AI, positioned as a state-of-the-art competitor for agentic tasks and front-end coding at significantly lower cost than closed frontier models.

Kimi K2.5 was pre-trained on approximately 15 trillion mixed visual and text tokens and introduces native self-directed agent swarm capability: the model can decompose complex tasks and coordinate up to 100 sub-agents executing up to 1,500 parallel tool calls. Kimi reports a 4.5x speedup and 80% reduction in end-to-end runtime versus single-agent operation on complex tasks. Benchmark highlights include BrowseComp at 74.9 (beating GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro), SWE-Verified at 76.8 (trailing GPT-5.2 and Claude Opus 4.5 but beating Gemini 3 Pro), and MMMU-Pro vision at 78.5 (beating Claude Opus 4.5). The model is trained using a novel parallel agent reinforcement learning (PARL) method.

Berman highlights cost-performance ratio as the model’s defining edge—on several benchmarks it matches or exceeds GPT-5.2 at a fraction of the price. Demos include website recreation from screenshots and multi-step visual path-finding using BFS code execution. The model is available at kimi.com and as downloadable open weights for local deployment.

📺 Source: Matthew Berman · Published January 27, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

Matthew Berman

Tags

Claude Opus 4.5 Gemini 3 Pro GPT 5.2 Kimi Kimi K2.5

Prev

Clawdbot Explained in 6 minutes (Beginner’s Guide)

Clawdbot Explained in 6 minutes (Beginner’s Guide)

Next

100 Hours Testing Clawdbot vs Claude Code (honest results)

100 Hours Testing Clawdbot vs Claude Code (honest results)

18 Related Posts

Related Posts

42:12

Research & Benchmarks

What AI Agent Should YOU be Using?

1 day ago

10:46

Research & Benchmarks

Ring-2.6-1T: The 1 Trillion Parameter Open Source Model That NO ONE Can Run

1 day ago

05:42

Research & Benchmarks

NVIDIA New AI Is An Efficiency Monster

2 days ago

09:34

Research & Benchmarks

I Tried GPT Image 2.0 for 14 Days So You Don’t Have To

3 days ago

30:30

Research & Benchmarks

Which AI Image Generator Should You Actually Use?

5 days ago

24:34

Research & Benchmarks

Codex vs Cowork for Regular People (Every Feature Compared)

1 week ago