Gemma, DeepMind’s Family of Open Models — Omar Sanseviero, Google DeepMind

Gemma, DeepMind’s Family of Open Models — Omar Sanseviero, Google DeepMind

More

Descriptions:

Omar Sanseviero, a researcher at Google DeepMind, delivers the first public conference talk on Gemma 4 just one week after its release at the AI Engineer conference. The Gemma 4 family spans from 2 billion to 32 billion parameters and is designed to run entirely on consumer hardware—including Android phones, iPhones, Raspberry Pis, and standard laptops—making it Google’s most capable open model release to date.

Key architectural highlights include a Mixture of Experts (MoE) model for low-latency tasks, a 31B dense model for maximum capability, and smaller E2B/E4B variants optimized for on-device inference via llama.cpp. All models support multimodal inputs—images, video, and audio—and were trained across more than 140 languages using the same tokenizer as Gemini, enabling strong performance even for low-resource languages like Quechua. Sanseviero demonstrates 10 Gemma instances running in parallel on a single laptop, generating SVGs at 100 tokens per second with no API calls.

Reception has been swift: Gemma 4 reached 10 million downloads within its first 24 hours and the broader Gemma family has surpassed 500 million cumulative downloads. Over 1,000 community fine-tunes and quantizations appeared within days of launch. The updated license now explicitly permits commercial use, directly addressing the most common complaint about previous Gemma versions.


📺 Source: AI Engineer · Published April 20, 2026
🏷️ Format: Keynote Launch

1 Item

Channels

1 Item

Companies