Comparing Full Precision vs Ollama Version of Qwen3.6-35B-A3B Locally

Benchmarks4 weeks ago

Comparing Full Precision vs Ollama Version of Qwen3.6-35B-A3B Locally

Descriptions:

Fahd Mirza runs a direct head-to-head comparison of Qwen 3.6 35B-A3B (a 35-billion-parameter mixture-of-experts model) in two configurations: full precision served via vLLM at 65 GB, and the Ollama Q4KM quantized version at 23 GB, both running on a single NVIDIA H100 with 80 GB VRAM. The test methodology is concrete — both models receive identical prompts, and generated code is compiled with GCC and executed to observe runtime behavior, not just syntactic correctness.

The key finding: both models produce compilable code for a Minesweeper game with a moving-mines twist, but the quality difference becomes visible at runtime. The full-precision version’s code triggers recursive flood-fill correctly, revealing 35 connected cells on a single click — a fundamental Minesweeper mechanic. The Q4KM Ollama version compiles with two minor warnings and only reveals the single clicked cell, missing the flood-fill logic entirely. Mirza explains Q4KM quantization in detail: 4-bit integer storage with K-means clustering for intelligent weight grouping, cutting memory requirements by roughly 75% while the “M” (medium) variant balances compression against quality preservation.

The practical takeaway is that Q4KM quantization handles simple tasks without visible degradation but introduces meaningful logic gaps in more complex code generation — a useful calibration point for developers choosing between full-precision and quantized local deployments of large open-weight models.

📺 Source: Fahd Mirza · Published April 18, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

No Image Available

Fahd Mirza

Tags

Ollama VLLM

Prev

AI News: Huge Updates From Anthropic, OpenAI and Google

AI News: Huge Updates From Anthropic, OpenAI and Google

Next

Dorsey Says AI Replaced 4,000 Managers.

Dorsey Says AI Replaced 4,000 Managers.

18 Related Posts

Related Posts

11:12

Benchmarks

Qwen3.6 27B Gets 20% Faster with MTP and llama.cpp Locally

5 days ago

09:15

Benchmarks

ZAYA1-VL-8B: Efficient Open Visual Intelligence – Run Locally

6 days ago

04:40

Benchmarks

One API Key for Every AI Model (Pay With Crypto)

1 week ago

08:57

Benchmarks

Google Releases Gemma 4 MTP Drafters – Run Locally and DFlash Comparison

1 week ago

08:44

Benchmarks

Are AI Coding Skills Just Hype? I Tested Them

2 weeks ago

11:03

Benchmarks

I Didn’t Expect This: Opus 4.7 vs GPT 5.5

2 weeks ago