Descriptions:
Fahd Mirza runs a controlled comparison between two 4-bit quantized versions of Google’s Gemma 4 12B model: Google’s own QAT (quantization-aware training) build, where compression was simulated during the training process itself, and Unsloth’s Q4_0, which applies post-training quantization to a model never adapted for it. Both versions weigh in around 7 GB and require just over 8 GB of VRAM under Ollama.
To ensure a fair fight, Mirza pins identical sampling hyperparameters in both Ollama model files โ temperature 1.0, top-p 0.9, top-k 64, 8192 context length โ using Google’s own recommended values for the Gemma 4 family. Two tasks are used: building an interactive jet engine turbine blade designer with live physics simulation in a single HTML file, and identifying and fixing a multi-pathology SQL query including correlated subqueries and functions on indexed columns.
The Google QAT build wins both tasks clearly. On the code generation task, it produces a fully interactive interface with working sliders and correct 24-blade rotor geometry, while the Unsloth version renders a static layout with non-functional controls. The video offers a practical guide for anyone running local models on consumer hardware and trying to decide which Gemma 4 12B quantization is worth the download.
๐บ Source: Fahd Mirza ยท Published June 07, 2026
๐ท๏ธ Format: Benchmark Test







