Gemma 4 12B on a 16GB Mac Mini Is Surprisingly Capable

Gemma 4 12B on a 16GB Mac Mini Is Surprisingly Capable

More

Descriptions:

Bart Slodyczka puts Google’s newly released Gemma 4 12B model through its paces on a 16GB M4 Mac Mini — a practical test of what entry-level local AI hardware can realistically handle in mid-2026. Using LM Studio, he walks through a critical and often-glossed-over detail: the model’s 7.56GB download size grows to 8GB at load time, and running at full 131,072-token context requires over 26GB of RAM. The practical ceiling on a 16GB machine sits well below the model’s theoretical maximum, and the video identifies a usable sweet spot around 10,000–15,000 tokens.

Performance in testing comes in at roughly 92 tokens processed at 11–12 tokens per second for simple exchanges — acceptable for conversational use and light agentic workflows. The video also demonstrates Gemma 4’s multimodal capabilities, including native image and audio processing without separate encoder models. An invoice OCR test shows promising results for document-processing automation, which Slodyczka frames as a target use case for his own workflow: filtering email and logging client invoices without cloud API costs.

For context, Slodyczka compares the 12B directly against the Gemma 4 26B (requiring 18GB minimum just to load) and the smaller E2B/E4B variants designed for mobile. The video provides specific, reproducible numbers for anyone deciding whether a 16GB Mac Mini is a viable local AI workstation, and explains Gemma 4’s architectural shift away from separate multimodal encoders toward a unified transformer design.


📺 Source: Bart Slodyczka · Published June 04, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

1 Item

Companies