Kimi K2.7 vs GLM-5.2: Real Coding Showdown in Hermes Agent

Kimi K2.7 vs GLM-5.2: Real Coding Showdown in Hermes Agent

More

Descriptions:

Fahd Mirza runs a live, head-to-head coding comparison between two of China’s most capable open-source models — Kimi K2.7 Code from Moonshot AI and GLM-5.2 from Zhipu AI — both tested inside the Hermes agent framework on a real Ubuntu system. Rather than synthetic benchmarks, the test uses a genuine World Cup 2026 standings tracker application with a deliberately planted bug.

The bug is subtle but meaningful: the app ranked third-place teams by points alone, ignoring goal difference, violating FIFA tiebreaker rules — causing Ecuador to incorrectly advance over Ghana despite a worse goal differential. Both models receive identical prompts asking them to find and fix the bug, identify its root cause, and build an entirely new Round of 32 bracket feature, all in a single agentic session. GLM-5.2 (744 billion parameters, 1 million token context, MIT-licensed) completed the task in 97 tool calls, correctly reordering teams by goal difference and generating a full 16-match bracket that respects group-separation rules. Kimi K2.7 (1 trillion total parameters, 32 billion active via mixture-of-experts, 256K context) finished faster — just over five minutes — with more tool calls, delivered the same correct fix, and added forward-looking bracket progression logic as an unprompted innovation.

Both models pass the test convincingly, demonstrating genuine agentic coding capability on a multi-file codebase with CRUD operations. Differences emerge in speed, tool-call efficiency, and creative additions, making this a useful reference for developers choosing between frontier open-weight Chinese models for coding agents.


📺 Source: Fahd Mirza · Published June 14, 2026
🏷️ Format: Comparison

1 Item

Channels