FunctionGemma – Function Calling at the Edge

FunctionGemma – Function Calling at the Edge

More

Descriptions:

Sam Witteveen dives into Function Gemma, Google’s specialized variant of the Gemma 3 270M model fine-tuned for customizable function calling at the edge. Unlike server-side function calling APIs, Function Gemma is designed to run fully locally — on mobile phones, in-browser via transformers.js, or on edge hardware like the NVIDIA Jetson Nano — with no network round-trip required.

The core design philosophy is customization over out-of-the-box convenience. Developers are expected to create their own function definition datasets, fine-tune the model for their specific tool schemas, then deploy a purpose-built local model. Witteveen walks through the mechanics: special tokens for function declarations, function call starts, and function responses mirror the structure of server-side APIs but are baked into the model weights. A Hugging Face notebook demonstrates loading the gated model, defining a standard get-weather tool, constructing the chat template with tool definitions, parsing the model’s function call output, injecting the tool result back as a message, and receiving the final natural-language response.

The underlying Gemma 3 270M base model was trained on 6 trillion tokens — more than Gemma 3’s 1B, 2B, and 4B variants — making it unusually capable for its size. Google has released a companion mobile demo app showing games and applications running function calling fully on-device. Witteveen also highlights a related project by collaborator George demonstrating a complete on-device RAG system powered by Gemma models, pointing to a broader trend of sophisticated AI pipelines running locally on consumer hardware.


📺 Source: Sam Witteveen · Published December 19, 2025
🏷️ Format: Tutorial Demo

1 Item

Channels

1 Item

Companies