Descriptions:
Fahd Mirza pits two of the most capable quantized open-source models against each other in a local deployment showdown: Qwen 3.6 35B-A3B from Alibaba and Gemma 4 26B from Google DeepMind, both running in Q4KM format via Ollama on an NVIDIA RTX A6000 with 80 GB of VRAM. The video opens with a clear explanation of Q4KM quantization — 4-bit integer weights with K-means clustering applied to preserve quality — and benchmarks show less than 2% degradation versus full precision under typical conditions, though Mirza cautions against production use based on his own earlier testing.
The primary evaluation task is a demanding single-prompt coding challenge: generate a self-contained HTML file simulating a real-time monsoon supercell storm system tracking its migration path from the Ganges Plain through Rajasthan and into Sindh, complete with procedural landscape generation, particle physics, and recursive civil algorithm rendering. Gemma 4 draws 24 GB of VRAM and produces code that fails to render any storm visualization; Qwen 3.6 draws 33 GB and produces a richer output with terrain statistics, though the storm animation still falls short of the full-precision version. A multilingual translation task using a Bob Dylan verse rounds out the comparison.
Mirza concludes Qwen wins the coding test, but frames the victory as “tainted” by quantization limits — both models expose meaningful quality loss on complex, multi-constraint tasks that full-precision versions handle well. Viewers on consumer 24 GB GPUs can reference the VRAM numbers directly when deciding which model to run.
📺 Source: Fahd Mirza · Published April 18, 2026
🏷️ Format: Comparison







