Ornith 1.0 35B in GGUF – Beats Models 10x Its Size – Run Locally

Ornith 1.0 35B in GGUF – Beats Models 10x Its Size – Run Locally

More

Descriptions:

Fahd Mirza puts Ornith 1.0 35B through its paces in this hands-on local deployment walkthrough. Ornith is a mixture-of-experts model from the Deep Reinforce open-source coding family that activates only around 3 billion parameters per token despite having 35 billion total — meaning it runs lean while punching above its weight class. Mirza downloads the Q8 GGUF quantization (approximately 37 GB) from Hugging Face, serves it locally using llama.cpp on an NVIDIA H100 with 80 GB of VRAM, and wires it into the Hermes agent framework.

The primary stress test is a full-stack call center helper app deliberately seeded with four bugs — including a front-end crash, a broken customer lookup, a call-logging failure, and a silent wrong-average calculation — plus a red-herring command designed to mislead the model. Ornith successfully diagnoses and fixes the issues through the Hermes agent’s tool-use loop. A second test asks the model to generate a single self-contained HTML file simulating a global railway hub with 12+ track lines, collision-avoidance signaling, and a full day-night cycle drawn on canvas with no external libraries.

On benchmarks, Ornith 35B reportedly beats Qwen 3.5 35B, Qwen 3.6 35B, and Gemma 4 31B at equivalent size, and holds its own against Qwen 3.5 397B — a model more than ten times larger. A distinctive training detail: rather than using a fixed human-designed problem-solving harness, Ornith learns to write its own harness during reinforcement learning, then optimizes both the plan and the answer together.


📺 Source: Fahd Mirza · Published June 29, 2026
🏷️ Format: Hands On Build

1 Item

Channels