⚡️ Google’s Open AI Strategy — Omar Sanseviero, Google DeepMind

⚡️ Google’s Open AI Strategy — Omar Sanseviero, Google DeepMind

More

Descriptions:

In this Latent Space podcast interview, Omar Sanseviero from Google DeepMind walks through the technical decisions and strategic thinking behind Gemma 4 and Google’s broader open model program, covering architecture, deployment partnerships, and evolving fine-tuning trends.

Sanseviero explains the ‘effective parameters’ approach used in Gemma 4’s smaller variants: a per-layer embedding table that allows a model with nearly 5 billion total parameters to load only 2 billion into GPU memory, with the remainder offloadable to CPU or disk — designed specifically for on-device inference on Android phones and Raspberry Pi. The 29B and 31B Gemma 4 models use a different architecture for larger deployments, and Sanseviero notes experiments with scaling the per-layer embedding approach are ongoing. He also describes coordinating the Gemma 4 launch with approximately 50 external partners including llama.cpp, Ollama, vLLM, Hugging Face, Nvidia, AMD, and an Android Studio integration enabling offline coding assistance with Gemma 4 locally.

The conversation covers shifting fine-tuning trends — many partners who planned to fine-tune Gemma 4 found the base model performed well enough out of the box — as well as early research into diffusion-based text generation models, how Gemma Nano powers on-device Gemini features in Pixel and high-end Samsung devices, and why Google sees open models as central to its long-term AI platform strategy.


📺 Source: Latent Space · Published May 24, 2026
🏷️ Format: Interview

1 Item

Companies