Descriptions:
Fahd Mirza tests Gemma 4 12B Coder — a fine-tuned variant of Google’s Gemma 4 12 billion parameter model specialized exclusively for Python — integrated with the Hermes agent framework on an NVIDIA RTX A6000 GPU with 48GB of VRAM. The model was trained on a curated dataset where every example had to pass execution tests before being included, with Anthropic’s Fable 5 serving as one of the teacher models. Failed generations were retried from scratch with a second teacher, making it a dual-verified fine-tune.
Mirza runs three real-world tasks. First, a bug fix in a World Cup 2026 simulation app that was incorrectly ignoring goal difference when ranking third-place teams — the model finds and patches the issue in a single prompt, and the fix is verified by re-running the application. Second, a creative coding task asking it to generate a standalone HTML canvas tree animation from scratch, which the model fails, hallucinating a file path rather than producing the actual file. Third, an SQL query task is attempted but results are not fully shown in the transcript.
The model runs at approximately 16GB VRAM on the Q8 quantization. Mirza serves it via Ollama and recommends Q4_K_M as the minimum for commodity GPUs with 8GB VRAM. The overall verdict is strong on debugging and structured code tasks, unreliable on open-ended creative generation.
📺 Source: Fahd Mirza · Published June 15, 2026
🏷️ Format: Hands On Build







