Google just dropped Gemma 4… (WOAH)

Google just dropped Gemma 4… (WOAH)

More

Descriptions:

Matthew Berman delivers a thorough breakdown of Google’s Gemma 4 model family, covering all four released sizes: effective 2B and 4B parameter models designed for on-device deployment, a 26B mixture-of-experts model with 4 billion active parameters, and a 31B dense model that currently ranks third among all open-weights models on the Arena AI text leaderboard — behind only GLM5 and Kimmy K2.5, both of which are dramatically larger.

Berman explains the ‘effective parameter’ (E2B/E4B) nomenclature, which refers to per-layer embedding tables that maximize parameter efficiency for edge deployments without increasing layer count. He charts all models on a parameter-count vs. ELO-score axis, showing Gemma 4 31B performing comparably to Qwen 3.5 (a 397B/17B-active MoE) at a fraction of the size. He notes that Kimmy K2.5 — a near-trillion-parameter model — cannot even run on an NVIDIA GB300 with 750 GB of unified memory, making Gemma 4’s efficiency particularly significant.

On capabilities, Berman highlights native function calling, structured JSON output, multimodal support across video and images, and explicit OpenClaw compatibility confirmed in Hugging Face’s launch blog. He frames Gemma 4 as the right model for a hybrid deployment strategy: frontier hosted models like Claude Opus 4.6 or GPT-5 for hard reasoning tasks, with Gemma 4 handling the majority of agentic workload locally or via OpenRouter.


📺 Source: Matthew Berman · Published April 03, 2026
🏷️ Format: Deep Dive

1 Item

Channels

1 Item

Companies