Ornith 1.0 35B in GGUF – Beats Models 10x Its Size – Run Locally

Coding & Dev Tools5 days ago

Ornith 1.0 35B in GGUF – Beats Models 10x Its Size – Run Locally

Descriptions:

Fahd Mirza puts Ornith 1.0 35B through its paces in this hands-on local deployment walkthrough. Ornith is a mixture-of-experts model from the Deep Reinforce open-source coding family that activates only around 3 billion parameters per token despite having 35 billion total — meaning it runs lean while punching above its weight class. Mirza downloads the Q8 GGUF quantization (approximately 37 GB) from Hugging Face, serves it locally using llama.cpp on an NVIDIA H100 with 80 GB of VRAM, and wires it into the Hermes agent framework.

The primary stress test is a full-stack call center helper app deliberately seeded with four bugs — including a front-end crash, a broken customer lookup, a call-logging failure, and a silent wrong-average calculation — plus a red-herring command designed to mislead the model. Ornith successfully diagnoses and fixes the issues through the Hermes agent’s tool-use loop. A second test asks the model to generate a single self-contained HTML file simulating a global railway hub with 12+ track lines, collision-avoidance signaling, and a full day-night cycle drawn on canvas with no external libraries.

On benchmarks, Ornith 35B reportedly beats Qwen 3.5 35B, Qwen 3.6 35B, and Gemma 4 31B at equivalent size, and holds its own against Qwen 3.5 397B — a model more than ten times larger. A distinctive training detail: rather than using a fixed human-designed problem-solving harness, Ornith learns to write its own harness during reinforcement learning, then optimizes both the plan and the answer together.

📺 Source: Fahd Mirza · Published June 29, 2026
🏷️ Format: Hands On Build

1 Item

Channels

No Image Available

Fahd Mirza

Tags

Fahd Mirza Gemma 4 31B Hermes Agent llama.cpp

Prev

OpenClaw in Your Hand: Building a Physical AI Terminal – Lech Kalinowski, Callstack

Next

LongCat-2.0: China Breaks Free From Nvidia to Train a 1.6T Model

18 Related Posts

Related Posts

09:39

Coding & Dev Tools

DeepSeek DFlash on Gemma 12B Locally: Up To 5x Faster

21 hours ago

15:45

Coding & Dev Tools

Every AI Agent Demo Stops at Email. I Pointed Mine at the Bills That Cost You Money.

21 hours ago

24:28

Coding & Dev Tools

Fable 5 is WILD…

2 days ago

08:08

Coding & Dev Tools

I Embedded Whisper.cpp Into a Real App

2 days ago

21:09

Coding & Dev Tools

I Built a Real AI Jarvis That Controls My Computer

3 days ago

13:29

Coding & Dev Tools

Control What Your AI Agents Can Do: Archestra + Ollama Hands-On

4 days ago