From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents — Cormac Brick, Google

Foundation Models2 months ago

From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents — Cormac Brick, Google

Descriptions:

Cormac Brick, a tech lead on Google’s AI Edge team, delivers a technical deep-dive into building on-device AI agents powered by tiny language models — defined here as models under one billion parameters. The talk covers Google’s full AI Edge stack: MediaPipe, LiteRT (the runtime formerly known as TensorFlow Lite), and the LiteTLM model harness, which together run across more than 2.7 billion Android devices on CPU, GPU, and NPU.

A major focus is the new agent skills system built on top of AI Core and Gemini Nano, using Gemma 4 E2B and E4B as the underlying base models. Brick demos modular skills — restaurant roulette, location lookup, ADB-based device debugging — that can be authored with Gemini CLI, published to GitHub, and loaded into apps at runtime. The skills framework launched just days before the talk, with community-contributed examples already appearing.

The second half tackles fine-tuning tiny LLMs for highly specific tasks, with Brick citing a jump from 46% to 90% accuracy as evidence that sub-billion-parameter models can be meaningfully specialized. He lays out a practical decision framework for mobile developers: use system-level GenAI (Gemini Nano via AI Core) when it fits the use case, and reach for a custom embedded TLM only when deeper specialization or offline-first behavior is required. Swift and JavaScript APIs for LiteRT are noted as forthcoming, with iOS open-source release planned alongside the Swift SDK.

📺 Source: AI Engineer · Published May 20, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

Google

Tags

Android Gemma 4 Google Google AI Edge Gallery

Prev

Wizstar AI Video Generator – Full Marketing Video From Just an Amazon Link | Full Walkthrough

Next

This AI Model Has No VAE! Testing HiDream-O1’s Unified Transformer

18 Related Posts

Related Posts

21:09

Foundation Models

Persona Engineering: A Field Guide to AI Synthetic Personas — Ishan Anand, InsightSciences.ai

1 day ago

21:39

Foundation Models

Serving 2 Million Models Without Melting: Scaling the Hugging Face Hub — Arek Borucki, Hugging Face

2 days ago

06:40

Foundation Models

AMD Releases First Ever AI model: Instella-MoE-16B-A3B-Think

2 days ago

24:01

Foundation Models

US AI Dominance Is Over: Here’s Why

3 days ago

17:31

Foundation Models

The Messy Reality of Scale: Synthetic Data and Pre-Training — Marah Abdin & Robert McHardy, poolside

4 days ago

20:24

Foundation Models

From Agent Traces to Agent Simulations — Rustem Feyzkhanov, Snorkel AI

5 days ago