New 225B Coding Model Laguna M.1 – Honest Test (Bugs + Creative Code)

Research & Benchmarks2 weeks ago

New 225B Coding Model Laguna M.1 – Honest Test (Bugs + Creative Code)

Descriptions:

Fahd Mirza takes Laguna M.1 — Poolside’s new 225-billion-parameter mixture-of-experts coding model with 23 billion parameters active per token — through two structured tests designed to probe real-world agentic coding performance. The model’s weights are publicly available on Hugging Face under a permissive license, though its scale requires a multi-GPU cluster; Mirza accesses it via API using the Hermes agent framework.

In the first test, Mirza points Laguna M.1 at a broken full-stack World Cup 2026 tracker application with a backend-to-frontend port communication failure. The model reads hundreds of files, identifies the root cause, and produces working fix instructions — the app loads correctly after applying its output. The second test asks the model to generate a procedurally animated tree simulation from scratch using only HTML canvas and physics-based growth, with no external libraries.

On benchmarks, Laguna M.1 outperforms Mistral’s Devstral 2 (a dense 123B model) across SWE-bench Verified, SWE-bench Multilingual, BenchPro, and TerminalBench, and edges out GLM 4.7 on two of those four. It falls short of DeepSeek V4 Flash and Qwen 3.5. Mirza notes the model’s reasoning trace shows some redundant file re-reading, suggesting chain-of-thought depth is an area for improvement in future versions. Overall, the video offers a candid early assessment of where a new open-weight coding model sits in a competitive field.

📺 Source: Fahd Mirza · Published June 18, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

Fahd Mirza

Tags

Claude Sonnet 4.6 DeepSeek V4 Flash GLM 4.7 Hermes Poolside Qwen 3.5

Prev

LoopCoder – The 7B Model That Thinks Twice – Does it Beat Others?

Next

The Age Of The 40-Year-Old Solo Founder Is Here

18 Related Posts

Related Posts

14:03

Research & Benchmarks

Fable 5 is Back! Here’s the Best Way to Use It…

24 hours ago

21:10

Research & Benchmarks

I Tested Gemini Spark: What Google’s AI Agent Can Actually Do in 21 Minutes

24 hours ago

10:50

Research & Benchmarks

Laguna XS 2.1: Poolside’s Local Coding Agent Tested – Nine Languages

2 days ago

12:40

Research & Benchmarks

Sonnet 5 vs Ornith 35B: Can a Local Model Beat Closed-Source?

3 days ago

10:26

Research & Benchmarks

NotebookLM’s Brand New Feature Generates Shorts With One Click

3 days ago

28:52

Research & Benchmarks

GLM-5.2 Proves Open-Source AI is Finally Good Now!

3 days ago