Qwen3.5-35B-A3B: A MoE Worth Gold – Full Local Setup Guide

Tutorials3 months ago

Qwen3.5-35B-A3B: A MoE Worth Gold – Full Local Setup Guide

Descriptions:

Fahd Mirza delivers a same-day setup guide for Qwen3.5 35B-A3B, a Mixture of Experts model from the Qwen series, running on an Ubuntu system with an Nvidia RTX 6000 (48GB VRAM). The model has 35 billion total parameters but activates only 3 billion per token — routing each input through 9 of 256 specialist expert layers — giving it the knowledge breadth of a large model at roughly the inference cost of a 3B dense model.

The tutorial covers downloading the Q8 quantized build (approximately 37GB) from unsloth via Hugging Face Hub, serving it through llama.cpp with full GPU offloading, and verifying VRAM consumption, which stays under 37GB at full Q8 quant. Key benchmark results include GPQA Diamond at 84.2, instruction following at 91.9, and competitive scores on agentic coding and multilingual tasks. A direct comparison with the 27B dense model shows the MoE variant completing chain-of-thought reasoning significantly faster — where the dense model took 5–6 minutes for thinking, the MoE model’s reasoning phase is notably shorter.

Qualitative tests include generating a Mars electrical storm simulation as a single self-contained HTML file using vanilla JavaScript and CSS — producing a particle system with procedural lightning and physics — and a jailbreak safety test using an emotional manipulation prompt, which the model correctly refuses. Despite strong MoE performance, Mirza recommends the 27B dense model for production use where VRAM permits, citing consistently high output quality.

📺 Source: Fahd Mirza · Published February 25, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Fahd Mirza

Tags

llama.cpp

Prev

Gemini 3.1 Pro in Antigravity can do anything… just watch

Gemini 3.1 Pro in Antigravity can do anything… just watch

Next

This simple Claude Cowork system saves 5 hours a week

This simple Claude Cowork system saves 5 hours a week

18 Related Posts

Related Posts

10:54

Tutorials

Talkie: I Ran a 1930 AI Model Locally and Talked to People from the Past

23 hours ago

03:02

Tutorials

Installing Claude Code

23 hours ago

08:17

Tutorials

OpenAI Codex Now Works from Anywhere (Dispatch Killer?)

23 hours ago

08:41

Tutorials

Luce DFlash Meets OpenClaw – Local AI Agents at 2x Speed with Qwen3.6-27B

2 days ago

24:07

Tutorials

Hermes Agent powered by local models on the DGX Spark is basically magic

2 days ago

03:21

Tutorials

Goal Mode Changes Everything for AI Coding

2 days ago