Mistral Medium 3.5 128B: Built for Long Stretches on Coding: Full Testing

Mistral Medium 3.5 128B: Built for Long Stretches on Coding: Full Testing

More

Descriptions:

Fahd Mirza puts Mistral Medium 3.5 through hands-on testing in this evaluation of the newly released 128-billion-parameter dense model. A key architectural decision sets it apart: Mistral unified instruct, reasoning, and coding into a single set of weights with configurable reasoning effort per request, eliminating the need to switch between specialist models. The release also led Mistral to retire their own dedicated coding agent—a notable signal of confidence. The model runs with a 256K context window and includes a built-from-scratch vision encoder. Benchmark context includes a SWE-Bench Verified score of 77.6, compared to the previous dedicated coding model’s 72.2.

Testing happens live via Le Chat, Mistral’s hosted platform. The first task—a self-contained falling sand physics simulation in vanilla JavaScript with six materials and no dependencies—succeeds on the first attempt. The second task is considerably more demanding: a real-time collaborative code review tool requiring WebSocket-based multi-client sync, inline line-level commenting, user presence tracking, persistent storage, and authentication. Authentication and basic scaffolding work correctly, but real-time comment synchronization fails across browser windows, putting overall task completion at roughly 60%.

The video also includes a multilingual generation test and closes with an honest assessment: strong out-of-the-box code generation for single-domain problems, but still inconsistent on complex multi-system integration tasks.


📺 Source: Fahd Mirza · Published April 29, 2026
🏷️ Format: Benchmark Test

1 Item

Channels