Mistral Medium 3.5 128B: Built for Long Stretches on Coding: Full Testing

Benchmarks2 weeks ago

Mistral Medium 3.5 128B: Built for Long Stretches on Coding: Full Testing

Descriptions:

Fahd Mirza puts Mistral Medium 3.5 through hands-on testing in this evaluation of the newly released 128-billion-parameter dense model. A key architectural decision sets it apart: Mistral unified instruct, reasoning, and coding into a single set of weights with configurable reasoning effort per request, eliminating the need to switch between specialist models. The release also led Mistral to retire their own dedicated coding agent—a notable signal of confidence. The model runs with a 256K context window and includes a built-from-scratch vision encoder. Benchmark context includes a SWE-Bench Verified score of 77.6, compared to the previous dedicated coding model’s 72.2.

Testing happens live via Le Chat, Mistral’s hosted platform. The first task—a self-contained falling sand physics simulation in vanilla JavaScript with six materials and no dependencies—succeeds on the first attempt. The second task is considerably more demanding: a real-time collaborative code review tool requiring WebSocket-based multi-client sync, inline line-level commenting, user presence tracking, persistent storage, and authentication. Authentication and basic scaffolding work correctly, but real-time comment synchronization fails across browser windows, putting overall task completion at roughly 60%.

The video also includes a multilingual generation test and closes with an honest assessment: strong out-of-the-box code generation for single-domain problems, but still inconsistent on complex multi-system integration tasks.

📺 Source: Fahd Mirza · Published April 29, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

No Image Available

Fahd Mirza

Tags

DeepSeek Fahd Mirza GLM 5.1 Mistral AI

Prev

How Deepseek v4 Connects to the US Grid

Next

Claude Design Masterclass: Websites, Videos & More (2 Hours)

18 Related Posts

Related Posts

11:12

Benchmarks

Qwen3.6 27B Gets 20% Faster with MTP and llama.cpp Locally

5 days ago

09:15

Benchmarks

ZAYA1-VL-8B: Efficient Open Visual Intelligence – Run Locally

6 days ago

04:40

Benchmarks

One API Key for Every AI Model (Pay With Crypto)

1 week ago

08:57

Benchmarks

Google Releases Gemma 4 MTP Drafters – Run Locally and DFlash Comparison

1 week ago

08:44

Benchmarks

Are AI Coding Skills Just Hype? I Tested Them

2 weeks ago

11:03

Benchmarks

I Didn’t Expect This: Opus 4.7 vs GPT 5.5

2 weeks ago