Qwopus 35B + MTP: The Coder That Fixes Its Own Bugs at 160 tok/s

Benchmarks3 days ago

Qwopus 35B + MTP: The Coder That Fixes Its Own Bugs at 160 tok/s

Descriptions:

Fahd Mirza tests Qwopus Coder, a 35-billion-parameter mixture-of-experts coding model built on the Qwen 3.6 architecture (3B parameters active per token), with a specific focus on its built-in Multi-Token Prediction (MTP) capability. The setup runs on a single Nvidia RTX 6000 Ada with 48 GB VRAM consuming just over 23 GB, served through llama.cpp with speculative decoding enabled via the `–draft-mtp` flag and a max draft of three tokens ahead — no secondary draft model required, since the draft heads are baked into the weights.

The practical test uses a deliberately broken full-stack call center dashboard — a FastAPI backend with SQLite and a plain HTML frontend — with planted bugs including a port mismatch between frontend and backend and multiple backend logic errors. Qwopus Coder, orchestrated through the Hermes agent framework, autonomously identifies and fixes all bugs without hints in a single agentic loop, verifying each fix before moving to the next. Post-run llama.cpp server logs show a 98.7% draft token acceptance rate and throughput of approximately 160 tokens per second on the 35B model — a strong result for single-GPU local inference on a model of this size.

Mirza explains MTP clearly: instead of one forward pass per token, the model’s integrated draft heads predict several tokens simultaneously from already-computed hidden states, with matching drafts kept at no additional cost. The video concludes that Qwopus Coder’s combination of agentic coding capability and high local throughput makes it a compelling option for developers running inference on prosumer or workstation-class hardware.

📺 Source: Fahd Mirza · Published July 01, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

No Image Available

Fahd Mirza

Tags

FastAPI Hermes Llama CPP Multi-Token Prediction

Prev

🔬 “The Most Innovative Diffusion Research Is Happening in Drug Discovery, Not Image Generation”

Next

Finally, an Open Standard for the Karpathy LLM Wiki is HERE

18 Related Posts

Related Posts

25:57

Benchmarks

I benchmarked the NEW Sonnet 5. The results shocked me.

4 days ago

30:52

Benchmarks

Frontier results, on device – RL Nabors, Arize

5 days ago

13:57

Benchmarks

Can Krea 2 Turbo Really Make Great Images in 8 Steps? ComfyUI Test

5 days ago

14:08

Benchmarks

Qwythos 9B: When You Train a Small Model on Claude Traces: Run Locally

7 days ago

09:36

Benchmarks

Qwen3.6 (REAP 90pct GGUF): The Brain-Damaged Model

2 weeks ago

18:17

Benchmarks

VibeThinker 3B – Taking on Giant Models

2 weeks ago