Descriptions:
Cormac Brick, a tech lead on Google’s AI Edge team, delivers a technical deep-dive into building on-device AI agents powered by tiny language models — defined here as models under one billion parameters. The talk covers Google’s full AI Edge stack: MediaPipe, LiteRT (the runtime formerly known as TensorFlow Lite), and the LiteTLM model harness, which together run across more than 2.7 billion Android devices on CPU, GPU, and NPU.
A major focus is the new agent skills system built on top of AI Core and Gemini Nano, using Gemma 4 E2B and E4B as the underlying base models. Brick demos modular skills — restaurant roulette, location lookup, ADB-based device debugging — that can be authored with Gemini CLI, published to GitHub, and loaded into apps at runtime. The skills framework launched just days before the talk, with community-contributed examples already appearing.
The second half tackles fine-tuning tiny LLMs for highly specific tasks, with Brick citing a jump from 46% to 90% accuracy as evidence that sub-billion-parameter models can be meaningfully specialized. He lays out a practical decision framework for mobile developers: use system-level GenAI (Gemini Nano via AI Core) when it fits the use case, and reach for a custom embedded TLM only when deeper specialization or offline-first behavior is required. Swift and JavaScript APIs for LiteRT are noted as forthcoming, with iOS open-source release planned alongside the Swift SDK.
📺 Source: AI Engineer · Published May 20, 2026
🏷️ Format: Deep Dive







