TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

More

Descriptions:

Cormac Brick, tech lead for Google AI Edge with a decade of experience spanning Intel NPU architecture and Google’s Pixel AI features, delivers a detailed technical presentation on running LLMs and agentic workloads on mobile and edge devices. The talk centers on two product areas: LiteRT-LM, Google’s LLM inference runtime for Android, iOS, and other edge platforms, and a new agent skills framework built on top of the latest Gemma 4 models that enables on-device agentic behavior without a cloud round-trip.

Brick explains the distinction between small language models (SLMs, roughly hundreds of millions to a few billion parameters) and tiny language models (TLMs, sub-100M), walking through the performance profiles Google observes across device classes. He covers Gemma 4’s launch the prior week alongside an Android and iOS reference app, performance numbers on mobile hardware, and how agent skills are structured — each skill is a self-contained unit of JavaScript, a spec file, and optional API credentials, with an orchestrator model routing user intent to the appropriate skill. Google’s internal team built approximately 80 skills using this pattern, with Gemini CLI and Claude Code as the primary authoring tools.

The second half focuses on fine-tuning and deploying tiny models to edge devices, ending with a real shipped application built using TLM technology. The talk is directly relevant to anyone building offline-capable, latency-sensitive, or privacy-preserving AI features on Android, iOS, or embedded platforms using Google’s open toolchain.


📺 Source: AI Engineer · Published May 03, 2026
🏷️ Format: Deep Dive

1 Item

Channels