Descriptions:
Wes Roth puts Kimi K2.5, the latest open-source model from Chinese AI lab Moonshot AI, through a series of practical coding tests — with particular focus on its new Agent Swarm mode. The feature, currently in beta, supports up to 100 parallel sub-agents, 1,500 tool calls per session, and claims 4.5x speed improvement over single-agent operation. On Humanity’s Last Exam, Kimi K2.5 scored 50.2%, described at time of recording as the top result for any single model, edging past OpenAI, Anthropic, and Google on that benchmark.
The video’s central challenge involves asking the model to recreate a visually complex interactive website — complete with particle smoke effects, animations, and responsive layout — from a video recording rather than a static image or description. Kimi K2.5 produces a close but imperfect result. Roth also tests it building a mobile-first e-commerce storefront from scratch, which produces a polished starting point branded “Meow Studios Premium Cat Accessories.”
Roth situates Kimi K2.5 within the broader competitive landscape using Open Router token-share data: Google leads at roughly 25%, followed by Anthropic at 17%, OpenAI at 14%, and XAI at 13%. He walks through free access via the Kilo Code VS Code extension — currently ranked first on Product Hunt — and addresses the recurring criticism of Chinese open-source models: strong benchmark numbers that historically underperform on real-world, edge-case usage, citing analyst Nathan Liens’ commentary on benchmark gaming by labs including Meta with Llama 4.
📺 Source: Wes Roth · Published January 29, 2026
🏷️ Format: Review







